Encode and Decode Strings - Leetcode 271
ฝัง
- เผยแพร่เมื่อ 22 มิ.ย. 2024
- 🚀 neetcode.io/ - Get lifetime access to every course I ever create!
Checkout my second Channel: / @neetcodeio
🥷 Discord: / discord
🧑💼 LinkedIn: / navdeep-singh-3aaa14161
🐦 Twitter: / neetcode1
📷 Instagram: / neetcodeio
🎵 TikTok: / neetcode.io
#leetcode #neetcode #python
Or use an escape character as most strings in programming languages do.
the method mentioned in the video works without having to reserve a special character as the escape character
Then it's not properly encoded. If you're going to decode it needs a valid encoding method.
I think the main drawback of this is the space occupied by the encoded string, if the list of strings doesn't use the escape character then it's fine, but the worst case scenario would be a list of identical strings containing only that character and it repeats, in that case the encoded string's length will be twice as big at least. One way to solve this is to look for the least common character in the list of strings and use it as an escape character, that would solve the space issue but that needs more processing as you have to identify the less common character in the list of strings. Using the method in the video avoids all of these issues from the get go. Cool!
A software we're using picked a character in the Unicode Private Use Areas, and uses it to separate parts in a string. It's actually pretty clever, since that area is guaranteed by Unicode never to be assigned, and thus allows interested parties to use their for own purposes.
I think when I had a problem like that I just used \0
oh wow new style love it
A trick I've used for when you want to load an entire line of a file into a single field in a database is to use the hex character for backspace (\08 I think) because you can almost guarantee there will never be a backspace character in your data (unless it's some sort of user behavior logging data). I imagine you could use that as a delimiter here although the interviewer might not be happy with it :)
Woww, i always used to do that in my mind for various questions, didn't knew this was a pattern also. Nice
Can you just use a number instead of number-special character? Looks like it can be omitted and it will work the same
the string "4total" would have your code try to read 64 characters when it turns it into "64total" which is why you need a character so it becomes 6#4total
doesn’t work for multi-digit numbers
@@JM-gg8ct Smart example
This is gooooood 🎉
finally, a leetcode that makes sense
Amazing
Nice! 🎉
I've failed a C++ interview on this. Was an interval mapping question but the gist of the solution was similar to this.
RESP also use this
Nice tip
Seems a bit or an overkill... Just use an uncommon unicode character as separator, or first escape your separator
so then what if the number# pattern already exists in the list of strings?
Consider the following example
ne4#et, code
It'll will code it as
6#ne4#et4#code
because ne4#et is 6 characters no matter what it contains
So decoding will start with 6# which will count 6 characters after first # -> ne4#et
then it'll decode 4# -> code
It doesn't matter. Each string in the list will be encoded as "{num}#{string}" so, when it's time to decode, you simply start at the beginning:
while the encoded string is not empty:
length = extract number until #
ignore #
string = extract length characters
append string to result list
As you can see, the algorithm doesn't parse the number inside the original string.
close to the concept of TLV - tag, length, value
Add space then split combined string
this is also quite similar to bencodes
Spilt to array and join.
just hope you don't have a number to encode at the end of your string, encoding an address for example like this is bound to have issue. You could use invisible characters or better combinaison of characters (like #a trigram of the software / your name / the company#).
Remember that this type of problem you have to ask question to define what are the constraint and type of data, and explain why you would implement this and not something else.
Why is this a problem? If I have a string let’s say “text7” it would be encoded as “6#text7” which means the last number would be still part of the data. This is basically the same concept with character count in data link protocol (except character count the counter also as a char)
Dude sounds like a character from Silicon Valley show. Not surprisingly though.
Using a delimiter and an escape character would be fine too, right?
Do you still need the # ? I think number is enough
2hello turns into 62hello if you dont use the # like so: 6#2hello
@@spacey6960 right! Thank you very much!
Similar to bencode
If you're gonna use python, why just don't use join and split string methods?
How do you know where to split on the decode part?
@@nakodares5982 you can choose whatever character you like, just make sure it's the same in the encode and decoee methods. Example: '#'.join and then .split('#')
@@nakodares5982 it'll be the same you use to encode it, if you choose to encode with '#'.join, then you have to decore with .split('#')
Why not use \0?
but what if those numbers are already part of the original words?
Answer is down by ''DavidAguileraMoncusi''
"3#not5#today"
I might be dumb for saying this but cant you just store it as a list and use the .join() method to combine them and .split() method to separate them.
what if one of the strings contains a number followed by a #?
You always start with a number and pound. Then you know the next x characters are meant as text and not meta characters. After the x characters you will be at the next number, cycle repeats.
@@tigerdan that completely went over my head
makes a lot of sense now
thanks!
Json.Serialize()
You just used the most expensive separator :))
What if I use a very complicated string like "#&442**#&" to join the strings. Then I don't have to calculate the length of the string.
How would you decode this string?
The point is that it should work for ANY string. If the input for some reason had the long separator it would break.
..
What if it has a number# in the string? Just keep track of the length in an int array or something lol...
What if it has a what? Can you give an example where this would not work
i think following example
ne4#et, code
but it'll work because it'll become
code - 6#ne4#et4#code
So decode decode 6# which will count 6 characters after first # -> ne4#et
then it'll decode 4# -> code
Consider the following example
ne4#et, code
It'll will code it as
6#ne4#et4#code
because ne4#et is 6 characters no matter what it contains
So decoding will start with 6# which will count 6 characters after first # -> ne4#et
then it'll decode 4# -> code
2#hello turn into 7#2#hello which is fine bc you read the 7, the first #, and then the next 7 characters are the string itself
The pound sign is always guaranteed as per the convention. So 7#neet works because there will always be 2 characters before the start of the word and the character immediately preceding the word will always be a #.
can't you just use JSON
Its very expensive compared to this
csv libs have solved this problem. But import csv will probably not be allowed by your interviewer. But maybe it will! Give it a try.
You could also just use a single # character as a separator. If you happen to have other # characters in the string then you can just add an additional one. So when you see a single # it's a separator and when you see more than one you can just remove one of them and consider it as part of the string.
Would this work for ["#", "##", "###"]?
@@JohnnyAdroit You're right, it wouldn't :(
@@JohnnyAdroitit would if you also add a space after each one