In Java same algo not working import java.util.*; class Solution { public boolean wordBreak(String s, List wordDict) { boolean[] dp = new boolean[s.length()+1]; Arrays.fill(dp, false); dp[s.length()] = true; System.out.println(Arrays.toString(dp)); for(int i = s.length(); i >= 0; i--) { for( String w: wordDict) { if( i + w.length()
@@sunilgadhetharia2926 you need to use equals in Java to compare strings s.substring(i, i + w.length()) == w should be s.substring(i, i + w.length()).equals(w) if you use equal, it is like comparing the pointers of the variables, not their contents
Awesome Explanation. You are the only guy who explains DP solutions with ease. I'm slowly getting comfortable with DP just because of your tutorials. Thank you so much.!!!
In my opinion, even the first brute force approach will require backtracking. Assume the wordDict = ["l", "leet", "code"], s = "leetcode". So now if we start matching then the first character "l" will match with wordDict[0] and now we will start searching for remaining "eetcode" and we won't find a match which is wrong. Hence we will have to backtrack and select "leet" instead of "l" to start with.
yep, i agree with you on this one, but it won't need backtracking, it can just be recursion. the time complexity will be pretty crazy. something like n^m, (more or less.)
How do we decide whether to start the DP table from the end or from the start? Is there anything particular in this problem that made you think we have to start from dp[8] as the base case and not dp[0] for the word "neetcode"?
Complexity appears incorrect (for the brute force). Each time a word is encountered, we have to break into 2 subproblems - we can either take the word, or continue without it. So it's exponential (2^n)
It gets split into two substrings, but only the right side is a subproblem which continues splitting. The left substring just gets checked against the dictionary as is, and only when it is present do we take the right substring as a subproblem. So it isn't 2^N in brute force.
@@jacoballen6099 You can certainly get all substrings in n^2. However, each time you will be comparing two substrings only, while the string can be formed of 3 or more words
def wordBreak(s: str, wordDict: List[str]) -> bool: # Convert the wordDict to a set for faster lookup wordSet = set(wordDict) # Create a dp array of size s + 1 dp = [False] * (len(s) + 1) # Set the first element of the dp array to True dp[0] = True # Iterate through the dp array Bottom Up for i in range(1, len(dp)): # Iterate through the dp array again for j in range(i): # If the dp[j] is True and the substring from j to i is in the wordSet if dp[j] and s[j:i] in wordSet: # Set the dp[i] to True dp[i] = True # Break out of the loop break
# Return the last element of the dp array return dp[-1]
Nice. This is similar but just explicitly codes in how we're marking valid word breaks with 'True'. class Solution: def wordBreak(self, s: str, wordDict: List[str]) -> bool: dp = [False] * (len(s) + 1) dp[0] = True
for i in range(1, len(s) + 1): for w in wordDict: if dp[i - len(w)] and s[i - len(w) : i] == w: dp[i] = True return dp[-1]
It only slightly improves computation time. Doing this will change wordDict lookup from O(w) to O(1), where w is the size of our wordDict. However for this problem, 'w' is restricted between 1 and 12, so at worst wordDict lookup as a list will be O(12) = O(1). There are no duplicate words in the wordDict. This is specified in the problem statement.
@@kockarthur7976 Odd because it says now word dict length is 1 to 1000, and that also means the initial proposition of neetcode of going over possible substrings O(n^2) which is actually O(nm) because the length bound on a dict word is 20 So the initial ends up with O(nm^2) vs. O(nwm) and m is much smaller than w Maybe they changed the question/tests recently?
Hey correct me if I am wrong but the approach that you did would be O(n^2*m) since you are slicing the string s as well, if you were to slice from 0 to len(s), wouldn't that be running in O(n) time? so the outer for loop is O(n) then inside you iterate through the word dict which is O(m) then inside the word dictionary you are slicing the string with worst case slicing be O(n) => O(n*m*n). Does it sound correct with you?
I think this is just O(n*m) because the description says that the max length of a word is 20 characters. This means that s[i:i+len(word)] is considered O(1) time.
I don't think solution will work for every case. What if the word dictionary for the same neetcode would be [nee, leet, code, ode, ne, etcode]. As you can see the word can be made from ne+etcode but the algo will look "code" and then rest of the "neet" or word break of neet. I think the problem is that once it find a smaller substring "code" in the word dic it stops looking for any super set eg "etcode" that could also be found in the dictionary. Is my understanding correct.
If you use that as a test case in leetcode it works. Becuase j is starting at 0 every time, eventually i will be at s.length and j will be at 0. Then j will increment until it reaches "ne" which will be true since it's found in the dict. Then the next check will see that a substring from j-i is also found in the dict, "etcode", and will set the last value in the dp array to true. Remember it doesn't truly care about what words it has found so far, only if the last value in the dp array is true or not. The other words only serve as places that the algorithm can check against. You would be right if we were removing words from the dict as we found them in the string, but we're checking the whole dict every time and the algorithm checks every combination of substrings.
@@joereeve2569 The solution from the video does not pass on Leetcode, they must have fixed the test case. You state that it checks every word in the dictionary everytime, but we can see clearly in the code @neetcode provided he breaks the loop when a match is found. When I run this algorithm with this input: "cars" ["car","ca","rs"] it matches "rs" at index 2, then it gets down to index 0 and matches "car", which causes the test case to fail since the "r" cannot be used twice. He is not keeping track of the index that was last matched. If I modify his algorithm to keep track of that information, it passes the above test case, but then in turn fails on this test case: "aaaaaaa" ["aaaa","aaa"]
@@jshpro3 Hey Josh, I went and tested my algorithm again to see if it failed against the fixed test cases you mentioned and it still passed. It also returns true for both test cases you just described. It's been so long since I did this I can't remember if I followed this video exactly, but I'll paste my code (it's in kotlin) here for you to see. Let me know if I'm missing anything. class Solution { fun wordBreak(s: String, wordDict: List): Boolean { val dp = Array(s.length + 1) {false} dp[0] = true for (i in 0..s.length) { for (j in 0 until i) { if (dp[j] == true && wordDict.contains(s.substring(j until i))) { dp[i] = true break } } } return dp[s.length] } }
Does it matter if I solve a problem in Top-Down approach and not Bottom-Up approach in an interview if the approach is not specified by the interviewer?
Wouldn't the brute force approach mentioned in 1:26 not work in this test case: s = "catsanddog" wordDict = ["cats","cat","and","dog"]. The algorithm would recognize "cat" as being in wordDict and go straight to looking at "sanddog" which isn't a valid word from the wordDict. Instead, it should see "cats" and "anddog" as two separate subproblems. Is there a workaround for this edge case?
To anyone who has confusion here, what I like to do is transform the solution into english (or whatever your comfortable language is). In the bottom up solution here, what we are basically doing is checking two things : 1) wherever we have reached, do we have enough space after our pointer that any word in wordDict can fit there. If yes, then is the word right next to our pointer the same word in wordDict? 2) If the answer for (1) is yes, then we know that from our current point, we have a word towards our right which exists in wordDict, great! But, can everything after our current pointer be found in the wordDict? So we check, can the word which starts just after our current word that we found (at i + len(word)) be found in the wordDict? if yes, then we put a TRUE on our current pointer as well, because we can split the string after our current pointer such that the words exist in wordDict. we continue this till the end and return dp[0], which basically states that everything after this point can be split such that all splitted words exist in wordDict. Hope this helps!
I don't understand why we break it if dp[i] , what happens if two words match the suffix? PS: if anyone didnt understand like me I think once we have dp[i] as True we dont need to check for rest of the words in wordDict since its anyways True, so its sort of a little optimization to break the for loop early
@@tonyiommisg I think once we have dp[i] as True we dont need to check for rest of the words in wordDict since its anyways True, so its sort of a little optimization to break the for loop early
Interestingly, looking at the bounds of the question they may have changed. Now the length of the string (n) is 300 No. of dict words is 1000 (w) Length of a dict word is 20 (m) This means the initial proposition of finding every substring O(n^2) which is actually O(nm) because you only need substring of length 1 to m vs. Going through every dict word at every index O(nw) Both also need an extra factor of m to actually check/get the substring but still technically the first (whilst also using a set) is technically better
okk I think no is taking about my apporoach. What i did was use a trie data structure which has EXTREMELY good lookup for word prefixes. Thus at the cost of some space, i was able to solve this problem in a very efficient way
In python, slicing with out of bound indices does not throw an error. It automatically returns until the end of the string even when ending index is out of bounds. So we may remove the size check
Potential reason to use a Trie: - narrow down the list of words to iterate over As the list of words is considerably small the optimization might be an overkill. However the case where a Trie would make perfect sense is if we were operating on the same set of words but checking different strings. Eg: wordDict = Entire english dictionary (approx. 700K words), checking multiple strings against the same dictionary.
your explanation clarifies all my doubts about the solution. Thanks for creating this awesome resource. btw what tool do you use to explain with free hand sketching in different colors, is it done through any specific software? thanks again
Brilliant. Took me two days to understand this. One thing I do not get, how do we know for certain we do not need to compare all the words in the dictionary -- we return immediately from dp[i] break;
Not sure if helpful but here is the solution flipped around so its top down, this might be even more confusing: class Solution: def wordBreak(self, s: str, wordDict: List[str]) -> bool: dp = [False] * (len(s) + 1) dp[0] = True for i in range(1, len(s)+1): for w in wordDict: if (i - len(w)) >= 0 and s[i - len(w): i] == w: dp[i] = dp[i - len(w)] if dp[i]: break return dp[len(s)]
Very nice explanation. I'm just confused with 1 thing how come time complexity is O( n^2 m) so ideally there are just 2 loops one iterates over input string and inner loop iterates over each dic word and does the comparison, so complexity should be O(nm) (n= len of string) (m = max length of word in dict). What am I missing here?
Can we use a Trie ro solve this, if we build a Trie from the word dict, the. Check and eliminate the chars from the given string where endofWord is true?
Great explanation. I have one question, I still am not sure why for i in range(len(s)-1, -1, -1): for w in wordDict: has a time complexity O(n*m*n) where n=len(s) and m=len(wordDict). Why is it not O(n*m)?
Late response here, so you may have already figured it out (and I hope someone corrects me here if I am wrong): I believe that last n in the big O notation is because that is the time it takes to compare both strings against each other. Comparing strings is a linear time operation I believe. So when we are checking to see if one of the words in wordDict is contained within our input string s, its possible that the word in wordDict could potentially have the same length as s. So in the worst case if the word in wordDict does in fact have the same length as s, (and we are implying that n = s.length), then we are comparing two strings of length n against each other, which would equate to the last n you are seeing in the Big O notation. Because we could potentially be doing that computation/comparison n*m times. Hope that makes sense. :)
This is a bit late, but the point is that if you add the length of the string, in this case "code" from index 4, it should end up on length of the string. Meaning it's +1 greater than array. So it would be dp[4] is equal to dp[4+4] which is dp[8], our base case. so then we match dp[0] with neet which gets it's value from dp[4]
@@NeetCode thank you for replying absolute honour, your videos are a life saver for an aussie trying to get a grad job! did you prefer to start from i=0 when doing dp personally?
The logic can be implemented with a Trie. I passed 35/47 testcases with a Trie based solution. But then got a TLE. Then I added DP to my Trie based solution and passed all testcases with Time Complexity better than 70% solutions.
i think that solution could even possibly be something like n^m (more or less). imagine you have s as "aaaaaaaaaab", and list as [a,aa,aaa,aaaa,aaaaa, b,c,d,e,f,g,h,i,j,k,l,m,n, ....], what would be the time complexity for this one? if you break it down, each steps, you check if there's a match, we can use hashmap, that could be O(1). but you have to do it n^2 times to check all possible substring with the hashmap, and each time there's a match, you might have to do the n^2 time again, so there will be maximumly max(m, n) depth. each depth has n^2 times, that gives about n^(2*m) times. and the algorithm is probably a little faster than this, cos this is the time complexity if there's always a match we could possibly do. but yepp, worst scenario is probably something like this.
class Solution: def wordBreak(self, s: str, wordDict: List[str]) -> bool: words = set(wordDict) sizes = set([len(ele) for ele in words]) n = len(s) dp = [False] * (n + 1) dp[0] = True for i in range(n): for size in sizes: if i + size
An optimization would be to first check if dp[i+len(w)] is true before you go the compare. This way you might save some time comparing strings that even if you find it much, you are going to put a false in dp[i]
i think it is better taking first letter and size from the selected word of the dictionary ,then check the word in the string .. instead of checking all the strings of size n
the main thing that confused me was `dp[i] = dp[i+len(w)]`, but now i understand it: it's just: If the word at index i to i +len(w), matches with some word w in dictionary: --> Then the T/F of dp[i] degrades to/depends on the T/F of dp[i+len(w)]
If we start solving the problem from the first word then there’s no need of the cache step. One directly jumps to the 5th word and then solve the remaining recursively. Why force dp on this problem?
Can you please help me understand something??. What if instead of "leet", the word in the dictionary was "eet", or "et ", and the same in case of "code ", like say "de", or, "ode"..? How would we approach then?
top-down solution I made after watching this vid since bottom-up felt unintuitive for me. class Solution: def wordBreak(self, s: str, wordDict: List[str]) -> bool: dp = [False] * (len(s)+1) dp[0] = True for i in range(1,len(s)+1): for w in wordDict: if s[i-1:i-1+len(w)] == w and dp[i-1]: dp[i-1+len(w)] = True return dp[len(s)]
Not only we can break early, but we must break early. This is not just an optimization. Otherwise, test case: s = "abcd" and wordDict = ["a","abc","b","cd"] will fail. The reason is, once we find "a", (and i + len(word)) to be True, that is a path. But if we move on, and we find "abc", then i + len(word) for "abc" will have us be dependent on there being a "d" in the dictionary, which there isn't. So, it is important to stop as soon as we found a path out, and not do an exhaustive search.
any reason Tries can 't be used in this problem , apart from cost of building the Trie,, ,, I can see the code implementation trying to match word from last index , do we have any advantage in this approach over starting from the first index ?
Why dont we put the wordDict into a set and get constant lookup? The question states that len(wordDict) > len(s), So isn’t it better to optimize for worddict lookup?
Not sure if bruteforce would pass case like, s = "aaaaaaaa" wordDict = ["aaa", "aaaa"] because if you start from wordDict[0], it can be split into "aaa - aaa - aa" (false), but if you start from wordDict[1], it's "aaaa - aaaa" (true).
Can someone explain the time complexity of the DP problem to me? Where M = len(s) and N = len(wordDict), the time complexity would be O(M*N) correct? I see others saying it would be O(M*N^2) because of the list slicing. Is that or is that not true and why?
@@guynameddan427 Comparing 2 strings in worst case will be O(n) time complexity. -> e.g here in the above example if the complete word "neetcode" itself is present in the wordDict.
Is it a good idea to further optimize this algorithm by using a hashmap of arrays to store the list of words hashed by the starting alphabet? This would reduce the search complexity
🚀 neetcode.io/ - A better way to prepare for Coding Interviews
In Java same algo not working
import java.util.*;
class Solution {
public boolean wordBreak(String s, List wordDict) {
boolean[] dp = new boolean[s.length()+1];
Arrays.fill(dp, false);
dp[s.length()] = true;
System.out.println(Arrays.toString(dp));
for(int i = s.length(); i >= 0; i--)
{
for( String w: wordDict)
{
if( i + w.length()
@@sunilgadhetharia2926 you need to use equals in Java to compare strings
s.substring(i, i + w.length()) == w should be s.substring(i, i + w.length()).equals(w)
if you use equal, it is like comparing the pointers of the variables, not their contents
var findRepeatedDnaSequences = function(s) {
let count={};
for(let left=0;left=2)
res.push(k)
}
return res;
};
@NeetCode can you explain the time complexity of the dp solution ?
Awesome Explanation. You are the only guy who explains DP solutions with ease. I'm slowly getting comfortable with DP just because of your tutorials. Thank you so much.!!!
In my opinion, even the first brute force approach will require backtracking. Assume the wordDict = ["l", "leet", "code"], s = "leetcode". So now if we start matching then the first character "l" will match with wordDict[0] and now we will start searching for remaining "eetcode" and we won't find a match which is wrong. Hence we will have to backtrack and select "leet" instead of "l" to start with.
exactly what I thought
For eetcode dp[0+1] will be false so loop will continue until it finds leet where dp[0+4] was previously set true as code was found.
yep, i agree with you on this one, but it won't need backtracking, it can just be recursion. the time complexity will be pretty crazy. something like n^m, (more or less.)
@@quirkyquester Yup, that's what I did and then you just get TLE
These are the best leetcode videos, no doubt. Thank you for the consistent and clear explanations.
Doing great, soon you will be the first leetcoder to have a "teamblind 75" playlist!
you mean a neetcoder 😃
i was confusion till i watched this back a few times, I finally get the idea of why you are doing it this way. THANK YOU!!!
Super crisp explanation. I understand your videos in just 1 go.
These days I am searching for 'Leetcode # neetcode' in YT.
Thanks dude.
same my thoughts
same!!
typing "neetcode #" is easier 😄
Amazing videos. You've changed how I understand and think through problems instead of just memorization... Thank you!!!!
How do we decide whether to start the DP table from the end or from the start? Is there anything particular in this problem that made you think we have to start from dp[8] as the base case and not dp[0] for the word "neetcode"?
can start from anywhere, it doesn't matter
I came up with a recursive top-down solution, using the substrings as the keys in the dp cache. It seems to be pretty much equally efficient.
Complexity appears incorrect (for the brute force). Each time a word is encountered, we have to break into 2 subproblems - we can either take the word, or continue without it. So it's exponential (2^n)
It gets split into two substrings, but only the right side is a subproblem which continues splitting. The left substring just gets checked against the dictionary as is, and only when it is present do we take the right substring as a subproblem. So it isn't 2^N in brute force.
@@toolworks It is 2^N. There's a proof in the discuss section in Leetcode Website for this problem
@@mandy1339 It is not 2^n.
You can make all substrings in O(n^2). You can make all sub sequences in O(2^n).
This problem is not 2^n
@@jacoballen6099 You can certainly get all substrings in n^2. However, each time you will be comparing two substrings only, while the string can be formed of 3 or more words
Neetcode you are god. The way you teach is next level. Harvard should hire you
Great job! Please do Word Break II as well. Curious to see if we can use the same bottom up dp approach to store all possible sentences.
def wordBreak(s: str, wordDict: List[str]) -> bool:
# Convert the wordDict to a set for faster lookup
wordSet = set(wordDict)
# Create a dp array of size s + 1
dp = [False] * (len(s) + 1)
# Set the first element of the dp array to True
dp[0] = True
# Iterate through the dp array Bottom Up
for i in range(1, len(dp)):
# Iterate through the dp array again
for j in range(i):
# If the dp[j] is True and the substring from j to i is in the wordSet
if dp[j] and s[j:i] in wordSet:
# Set the dp[i] to True
dp[i] = True
# Break out of the loop
break
# Return the last element of the dp array
return dp[-1]
Wow!! Amazing explanation. Thank you so much! I learned a lot from your videos. You're a hero!! Please keep uploading these quality videos!
You're so smart, I learned a lot from your channel.
Nice. This is similar but just explicitly codes in how we're marking valid word breaks with 'True'.
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * (len(s) + 1)
dp[0] = True
for i in range(1, len(s) + 1):
for w in wordDict:
if dp[i - len(w)] and s[i - len(w) : i] == w:
dp[i] = True
return dp[-1]
Thanks this helped!
Adding following line at the start will improve total time required
wordDict = set(wordDict)
As there are so many duplicate words in some test cases
It only slightly improves computation time. Doing this will change wordDict lookup from O(w) to O(1), where w is the size of our wordDict. However for this problem, 'w' is restricted between 1 and 12, so at worst wordDict lookup as a list will be O(12) = O(1).
There are no duplicate words in the wordDict. This is specified in the problem statement.
@@kockarthur7976 Odd because it says now word dict length is 1 to 1000, and that also means the initial proposition of neetcode of going over possible substrings O(n^2) which is actually O(nm) because the length bound on a dict word is 20
So the initial ends up with O(nm^2) vs. O(nwm) and m is much smaller than w
Maybe they changed the question/tests recently?
Thank You So Much NeetCode Brother...........🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻
Thanks!!You saved my day, i was struggling for many hours until i watched this video!
Hey correct me if I am wrong but the approach that you did would be O(n^2*m) since you are slicing the string s as well, if you were to slice from 0 to len(s), wouldn't that be running in O(n) time? so the outer for loop is O(n) then inside you iterate through the word dict which is O(m) then inside the word dictionary you are slicing the string with worst case slicing be O(n) => O(n*m*n). Does it sound correct with you?
I think this is just O(n*m) because the description says that the max length of a word is 20 characters. This means that s[i:i+len(word)] is considered O(1) time.
omg you are the best. my saviour. tears of joy. please always make these vids.
I don't think solution will work for every case. What if the word dictionary for the same neetcode would be [nee, leet, code, ode, ne, etcode]. As you can see the word can be made from ne+etcode but the algo will look "code" and then rest of the "neet" or word break of neet. I think the problem is that once it find a smaller substring "code" in the word dic it stops looking for any super set eg "etcode" that could also be found in the dictionary. Is my understanding correct.
I was thinking about the exact same thing. like how this code can avoid that
If you use that as a test case in leetcode it works. Becuase j is starting at 0 every time, eventually i will be at s.length and j will be at 0. Then j will increment until it reaches "ne" which will be true since it's found in the dict. Then the next check will see that a substring from j-i is also found in the dict, "etcode", and will set the last value in the dp array to true. Remember it doesn't truly care about what words it has found so far, only if the last value in the dp array is true or not. The other words only serve as places that the algorithm can check against. You would be right if we were removing words from the dict as we found them in the string, but we're checking the whole dict every time and the algorithm checks every combination of substrings.
There are test cases for this in leetcode already.
@@joereeve2569 The solution from the video does not pass on Leetcode, they must have fixed the test case. You state that it checks every word in the dictionary everytime, but we can see clearly in the code @neetcode provided he breaks the loop when a match is found.
When I run this algorithm with this input:
"cars"
["car","ca","rs"]
it matches "rs" at index 2, then it gets down to index 0 and matches "car", which causes the test case to fail since the "r" cannot be used twice. He is not keeping track of the index that was last matched. If I modify his algorithm to keep track of that information, it passes the above test case, but then in turn fails on this test case:
"aaaaaaa"
["aaaa","aaa"]
@@jshpro3 Hey Josh, I went and tested my algorithm again to see if it failed against the fixed test cases you mentioned and it still passed. It also returns true for both test cases you just described. It's been so long since I did this I can't remember if I followed this video exactly, but I'll paste my code (it's in kotlin) here for you to see. Let me know if I'm missing anything.
class Solution {
fun wordBreak(s: String, wordDict: List): Boolean {
val dp = Array(s.length + 1) {false}
dp[0] = true
for (i in 0..s.length) {
for (j in 0 until i) {
if (dp[j] == true && wordDict.contains(s.substring(j until i))) {
dp[i] = true
break
}
}
}
return dp[s.length]
}
}
I think the runtime for brute-force without memoization would be O(2^n), not O(n^2). String search would cost another O(m).
Does it matter if I solve a problem in Top-Down approach and not Bottom-Up approach in an interview if the approach is not specified by the interviewer?
Amazing, no other video on youtube explains this as crisp!
You sir are a saviour, well explained and neat solution. The LeetCode solution isn't as clean as this IMO.
Wouldn't the brute force approach mentioned in 1:26 not work in this test case: s = "catsanddog" wordDict = ["cats","cat","and","dog"]. The algorithm would recognize "cat" as being in wordDict and go straight to looking at "sanddog" which isn't a valid word from the wordDict. Instead, it should see "cats" and "anddog" as two separate subproblems. Is there a workaround for this edge case?
Would a suffix trie also be a good solution here?
To anyone who has confusion here, what I like to do is transform the solution into english (or whatever your comfortable language is).
In the bottom up solution here, what we are basically doing is checking two things :
1) wherever we have reached, do we have enough space after our pointer that any word in wordDict can fit there. If yes, then is the word right next to our pointer the same word in wordDict?
2) If the answer for (1) is yes, then we know that from our current point, we have a word towards our right which exists in wordDict, great! But, can everything after our current pointer be found in the wordDict? So we check, can the word which starts just after our current word that we found (at i + len(word)) be found in the wordDict? if yes, then we put a TRUE on our current pointer as well, because we can split the string after our current pointer such that the words exist in wordDict.
we continue this till the end and return dp[0], which basically states that everything after this point can be split such that all splitted words exist in wordDict.
Hope this helps!
The brute force approach is 2^n , not n^2
I don't understand why we break it if dp[i] , what happens if two words match the suffix?
PS: if anyone didnt understand like me
I think once we have dp[i] as True we dont need to check for rest of the words in wordDict since its anyways True, so its sort of a little optimization to break the for loop early
Same. Trying to understand this part
@@tonyiommisg I think once we have dp[i] as True we dont need to check for rest of the words in wordDict since its anyways True, so its sort of a little optimization to break the for loop early
Updated the main comment with the answer as well
Interestingly, looking at the bounds of the question they may have changed.
Now the length of the string (n) is 300
No. of dict words is 1000 (w)
Length of a dict word is 20 (m)
This means the initial proposition of finding every substring O(n^2) which is actually O(nm) because you only need substring of length 1 to m
vs.
Going through every dict word at every index O(nw)
Both also need an extra factor of m to actually check/get the substring but still technically the first (whilst also using a set) is technically better
This is a great explanation - best I've seen. Thanks!
hey neetcode, in 4:32 you said the max size of word dictionary is smaller than max size of string. but i looks like its the opposite:
from lc:
1
i have the same question. the answer may not be a good answer
Actually he meant to say that each word in wordDict is smaller than the string given. But he rather said wordDict
You are doing god's work here. Thank you.
okk I think no is taking about my apporoach. What i did was use a trie data structure which has EXTREMELY good lookup for word prefixes. Thus at the cost of some space, i was able to solve this problem in a very efficient way
In python, slicing with out of bound indices does not throw an error. It automatically returns until the end of the string even when ending index is out of bounds. So we may remove the size check
Potential reason to use a Trie:
- narrow down the list of words to iterate over
As the list of words is considerably small the optimization might be an overkill. However the case where a Trie would make perfect sense is if we were operating on the same set of words but checking different strings.
Eg: wordDict = Entire english dictionary (approx. 700K words), checking multiple strings against the same dictionary.
There is no advantage in going from len(s) to 0 in this case. You can do exactly the same thing with normal iteration from 0 to len(s).
No because you set dp[i] = dp[i+lenW] so you need to calculate top down
@@jacobcutts4099You can just return dp[len(s)-1] if starting from 0.
your explanation clarifies all my doubts about the solution. Thanks for creating this awesome resource.
btw what tool do you use to explain with free hand sketching in different colors, is it done through any specific software? thanks again
4:25, actually the problem description says the opposite. wordDict is longer.
1
wordsDict[i] matters in this case not worddict.length
Brilliant. Took me two days to understand this. One thing I do not get, how do we know for certain we do not need to compare all the words in the dictionary -- we return immediately from dp[i] break;
Not sure if helpful but here is the solution flipped around so its top down, this might be even more confusing:
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * (len(s) + 1)
dp[0] = True
for i in range(1, len(s)+1):
for w in wordDict:
if (i - len(w)) >= 0 and s[i - len(w): i] == w:
dp[i] = dp[i - len(w)]
if dp[i]:
break
return dp[len(s)]
It's not :D equally confusing and less with practice and watching
Very nice explanation. I'm just confused with 1 thing how come time complexity is O( n^2 m)
so ideally there are just 2 loops one iterates over input string and inner loop iterates over each dic word and does the comparison, so complexity should be O(nm) (n= len of string) (m = max length of word in dict). What am I missing here?
once u get a match with a dict word(which is O(m*n) ), u have to repeat the entire process again with the remaining part of the word. so n * (m*n).
what if words in dict are of different lengths? would it help to sort array by length in that case?
working the dp array from dp[8] down to dp[0] should be called as top down approach? but he said bottom up?
the best explanation, as always!
Can we use a Trie ro solve this, if we build a Trie from the word dict, the. Check and eliminate the chars from the given string where endofWord is true?
Good idea!
Great explanation. I have one question, I still am not sure why
for i in range(len(s)-1, -1, -1):
for w in wordDict:
has a time complexity O(n*m*n) where n=len(s) and m=len(wordDict). Why is it not O(n*m)?
Late response here, so you may have already figured it out (and I hope someone corrects me here if I am wrong):
I believe that last n in the big O notation is because that is the time it takes to compare both strings against each other. Comparing strings is a linear time operation I believe. So when we are checking to see if one of the words in wordDict is contained within our input string s, its possible that the word in wordDict could potentially have the same length as s. So in the worst case if the word in wordDict does in fact have the same length as s, (and we are implying that n = s.length), then we are comparing two strings of length n against each other, which would equate to the last n you are seeing in the Big O notation. Because we could potentially be doing that computation/comparison n*m times.
Hope that makes sense. :)
Thank you for sharing. Could you please explain more about why dp[8] is True?
This is a bit late, but the point is that if you add the length of the string, in this case "code" from index 4, it should end up on length of the string. Meaning it's +1 greater than array. So it would be dp[4] is equal to dp[4+4] which is dp[8], our base case. so then we match dp[0] with neet which gets it's value from dp[4]
I think what you said at 4:30 is wrong. As of 10/10/2024, leetcode says 1
Thank you! Awesome thinking here!
Thanks!
Thank you so much, I really appreciate it! 😃
can someone explain why we go backwards seems like quite the trend in Neetcodes DP solutions? couldnt we do this very similarly just from index 0?
Yeah, most people usually start from i=0, that def works. I go backwards because it's closer to the drawing explanation.
@@NeetCode thank you for replying absolute honour, your videos are a life saver for an aussie trying to get a grad job! did you prefer to start from i=0 when doing dp personally?
The logic can be implemented with a Trie. I passed 35/47 testcases with a Trie based solution. But then got a TLE.
Then I added DP to my Trie based solution and passed all testcases with Time Complexity better than 70% solutions.
I actually did the same Trie based solution, interesting to see someone else try that too.
Had the same exact approach! (That test case with a series of 'a's and a trailing 'b' made me add the cache :D)
I have a question guys, why the brute-force time complexity is O(n^2) instead of O(2^n) 3:06
i think that solution could even possibly be something like n^m (more or less). imagine you have s as "aaaaaaaaaab", and list as [a,aa,aaa,aaaa,aaaaa, b,c,d,e,f,g,h,i,j,k,l,m,n, ....], what would be the time complexity for this one? if you break it down, each steps, you check if there's a match, we can use hashmap, that could be O(1). but you have to do it n^2 times to check all possible substring with the hashmap, and each time there's a match, you might have to do the n^2 time again, so there will be maximumly max(m, n) depth. each depth has n^2 times, that gives about n^(2*m) times. and the algorithm is probably a little faster than this, cos this is the time complexity if there's always a match we could possibly do. but yepp, worst scenario is probably something like this.
Phenomenal explanation. Thank you
great video
1 note: you can also start from the beginning, you just need to slightly adjust the logic
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
words = set(wordDict)
sizes = set([len(ele) for ele in words])
n = len(s)
dp = [False] * (n + 1)
dp[0] = True
for i in range(n):
for size in sizes:
if i + size
I can only come out with brute forced solution, and get stucked what should i cache to speed up, and you explained it in a clear way, thanks.
thank you, i dont see how someone could do this without knowing the solution already
I mean I did it without checking the solution
@@ashkan.arabim ok
Isn't the runtime of the DP solution still technically O(n * m * n), where n is length of s and m is length of wordDict
Very interesting solution. Great job!
11:03 shouldnt dp[3]=dp[2]=dp[1]=0 or False? why is it eq to 1?
Yes you're right, I have no idea how I didn't notice that, sorry for the confusion.
An optimization would be to first check if dp[i+len(w)] is true before you go the compare. This way you might save some time comparing strings that even if you find it much, you are going to put a false in dp[i]
i think it is better taking first letter and size from the selected word of the dictionary ,then check the word in the string .. instead of checking all the strings of size n
not necessarily. You can still hash the comparison...
well explained! thank you very much good sir
the main thing that confused me was `dp[i] = dp[i+len(w)]`, but now i understand it: it's just:
If the word at index i to i +len(w), matches with some word w in dictionary: -->
Then the T/F of dp[i] degrades to/depends on the T/F of dp[i+len(w)]
Thank you so much. Your explanation was very helpful. I almost thought of giving up on this one; then saw your video.
If we start solving the problem from the first word then there’s no need of the cache step. One directly jumps to the 5th word and then solve the remaining recursively. Why force dp on this problem?
Thank you for your explanation! Could you also make a video of Leetcode 140 Word Break II ?
Such a fantastic problem!
Can you please help me understand something??. What if instead of "leet", the word in the dictionary was "eet", or "et ", and the same in case of "code ", like say "de", or, "ode"..?
How would we approach then?
top-down solution I made after watching this vid since bottom-up felt unintuitive for me.
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
dp = [False] * (len(s)+1)
dp[0] = True
for i in range(1,len(s)+1):
for w in wordDict:
if s[i-1:i-1+len(w)] == w and dp[i-1]: dp[i-1+len(w)] = True
return dp[len(s)]
better approach for memoization is to start from beginning and check from a set of worddict
What a god, you are my hero dude.
5:30 what decision tree looks like and how we can cache it
Can we use a trie to speed up the string lookup?
Not only we can break early, but we must break early. This is not just an optimization. Otherwise, test case: s = "abcd" and wordDict = ["a","abc","b","cd"] will fail. The reason is, once we find "a", (and i + len(word)) to be True, that is a path. But if we move on, and we find "abc", then i + len(word) for "abc" will have us be dependent on there being a "d" in the dictionary, which there isn't. So, it is important to stop as soon as we found a path out, and not do an exhaustive search.
What's the time complexity for recursive approach with and without caching?
any reason Tries can 't be used in this problem , apart from cost of building the Trie,, ,, I can see the code implementation trying to match word from last index , do we have any advantage in this approach over starting from the first index ?
why not convert wordDict into a set first? are we trying to save memory ?
backtrack & caching were enough to get all test case passed 😎
Thank u. save my life at 11:56PM. so I can go to sleep now.
Why dont we put the wordDict into a set and get constant lookup? The question states that len(wordDict) > len(s),
So isn’t it better to optimize for worddict lookup?
If word dict size is large wouldn't trie data structure be more efficient than having to loop through all dictionary element?
awesome and clear explanation!
Superb! Thank you again for your videos!!
Could you share what drawing tool do you use for your videos? It looks quite nice
what do you use for writing on computer and drawing?
Not sure if bruteforce would pass case like, s = "aaaaaaaa" wordDict = ["aaa", "aaaa"] because if you start from wordDict[0], it can be split into "aaa - aaa - aa" (false), but if you start from wordDict[1], it's "aaaa - aaaa" (true).
Can someone explain the time complexity of the DP problem to me? Where M = len(s) and N = len(wordDict), the time complexity would be O(M*N) correct? I see others saying it would be O(M*N^2) because of the list slicing. Is that or is that not true and why?
i hate this problem
why is it called bottom up when u are starting from the top of array in reverse order?
*JAVA SOLUTION*
class Solution {
HashSet set = new HashSet();
public boolean wordBreak(String s, List wordDict) {
if(s.length() == 0) return true;
if(set.contains(s)) return false;
for(String str:wordDict){
if(s.indexOf(str) == 0){
if(wordBreak(s.substring(str.length()),wordDict)) return true;
}
}
set.add(s);
return false;
}
}
@NeetCode why is this O(n^3). 2 nested loops. Is s[i:i+len(w)] o(n) for some reason? Array access is o(1)
I was thinking the same. Isn't the substring operation constant time complexity?
@@guynameddan427 Comparing 2 strings in worst case will be O(n) time complexity. -> e.g here in the above example if the complete word "neetcode" itself is present in the wordDict.
What is the time complexity of the final solution??
Is it a good idea to further optimize this algorithm by using a hashmap of arrays to store the list of words hashed by the starting alphabet? This would reduce the search complexity
you can use a trie
hey for the word leetcode and wordDIct=[''lee","leat","leets","leet","code"] I think the approach will fail
It shouldn't fail
do you mind mention time and space complexity at the end. Thank you!
Bruteforce can be to check for each substring .
A rough memoization (cache) solution:
class Solution:
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
res = [False]
memo = [True] * len(s)
def dps(memo, idx):
if idx >= len(s):
res[0] = True
return
if idx < len(s) and memo[idx] == False:
return
counter = 0
for word in wordDict:
if s[idx:idx + len(word)] == word:
idx += len(word)
dps(memo, idx)
idx -= len(word)
if s[idx:len(word)] != word:
counter += 1
if counter == len(wordDict) and idx < len(s):
memo[idx] = False
dps(memo, 0)
return res[0]
How does this algorithm work in scenarios where we skip characters : Input: s = "applepenapple", wordDict = ["apple","pen","ape"]