Lume. Literally like last minute revision!! Or teacher this year was so bad he barely taught us anything just made us do games, and I swear to god I have learnt more from these videos today than I have in the last year !! GOOD LUCK TO EVERYONE TAKING THE EXAM TOMORROW:) !
Well, that can't really be calculated unless you know exactly how the tree is stored by the programmer. But is clearly does increase file size by a bit
8 bit ASCII is standard in terms of working with characters, but in terms of storage, you don't want to use 8 bits for every character because that's wasteful. So the point of Huffman coding is that you can use the best number of bits for each individual character for lossless storage (averaging less than 8, sometimes not less than 8.) When you plan on actually working with the characters, you can convert it back to 8 bit ASCII.
You would normally, but because trees can be represented in a few different ways, and different languages deal with them differently, its size is not something you can know without more information, which is why I leave it out from the equation to simplify it. Anyway, at very large file sizes, the size of the tree becomes very small relative to the rest of the file
It seems to me that last character (n) in the "message" (Huffman), could have been encoded as just 1, since it is at the tail end of the message so there is no chance of it being any other symbol that starts with a 1 (such as m = 110). That would have been 2 bits shorter. Huffman codes are NOT ideal (not the shortest coding of symbols). This is a "perfect" example. If a 1 is seen and there are more bits after it, then it would NOT be the letter n, only if a 1 is seen and we run out of data is that 1 = n. We can get away with this because there are no embedded n characters in the message (only at the very end of the message). This alternate shorter coding scheme would thus work.
@@CunningBard There is only 1 occurrence of n in Huffman so having n be coded as a 1 is fine if the decoder always looks for the longest matching code first (such as u = 101) When the decoder sees a 1, it continues reading more input before deciding what symbol is it, and if that 1 is the last of the input (not already used in another decoding of a previous character), it can then properly deduce it is the letter n (cuz it cannot possibly be anything else such as H, u, or m). Therefore Huffman codes are NOT optimal, I just proved that. A coding / decoding algorithm is legit if there is no ambiguity, and in this example, using 1 = n, there is not, provide the decoder attempts to match the longest possible code first (such as 101 = u), before assuming it is an n followed by an f (which it is not). This is a special case where the last character (symbol) in the message appears only once in the message (which is the case here).
@@CunningBard The tree is basically only used to generate the codes, then they are stored in a table, so we can easily change the table entry of n from 111 to just 1, but as I already stated, then we must also change the decoder to NOT just pick the shortest matching code it seems and interpret it as that character in the lookup table. We could also do this for "a" (changing it from 00 to 0), if we make the decoder "smarter", telling it that if 2 symbols in the message still haven't been seen and it just read the last 2 input bits, those MUST be the last 2 symbols in the message... that is, the final 01 will be interpreted as "an" and NOT "f". So the message will be "Huffman", not "Huffmf". "01" on input can have 2 different meanings but can still be made nonambiguous because of where they can occur. Where they occur gives us more information about them. A "straight" Huffman coding is not as "smart" cuz it doesn't take advantage of the position on input of the 01, it treats them all the same. That is clearly NOT optimal. My solution is NOT optimal either, but it is a slight improvement.
This is definately the best video for understanding the Huffman Code! I was tearing my hair out before! Thank you! :)
I stopped watching halfway through the video because the explanation was so good, I had understood what I wanted to by then
clear explanation about the topics. Thanks for sharing
Thank you! The explanation was crystal clear and perfectly explained!
Simple explanation leads to an excelente video. Thanks
i really love your explanation
Hi! I just wanted to say thankyou so much for these videos, you helped me get an A* in computer science and I really appreciate it!!
Fantastic, congratulations! That's a big achievement I'm glad to have been able to help in some way :)
Idk Vierra if I follow this persons video will it help me out a lot?
Bob Sham it’s a bit fast
2 4 set the speed to 0.5x, problem solved.
exam tomorrow EZCLAPUHHHH
Lume. Literally like last minute revision!! Or teacher this year was so bad he barely taught us anything just made us do games, and I swear to god I have learnt more from these videos today than I have in the last year !! GOOD LUCK TO EVERYONE TAKING THE EXAM TOMORROW:) !
Same good luck x
@@amritajhujh6943 Everything went well, however the 6 marker pseudo-code question was difficult
You should enjoy it, someday you might make something good and you might get rich
Exam todaaaaayyyy I’m fuckin hot 🥵
Looks same like Database indexing. Nice video dude
thank you! this was perfectly explained, appreciate it !!!
Very helpful!
Honestly think yr gonna be the reason I don’t fail
amazing!! best one!! keep doing videos
18b + (tree data) = total size
what about the size required to store the tree?
Well, that can't really be calculated unless you know exactly how the tree is stored by the programmer. But is clearly does increase file size by a bit
crystal clear thanks!
Thanks a lot!
What's the point of storing the char count on each non-leaf node?
Where can i find actual compression codes i wanna learn
Thanks so much 👍🏼
Thanks
Which software we can use to make a Huffman Tree i.e, MS Word,etc.
Is there any other easy software (MS Visio) & how to use it.
I just made this in PowerPoint - I'm not aware of anything easier I'm afraid
Thought 8 bit ASCII was standard nowadays?
8 bit ASCII is standard in terms of working with characters, but in terms of storage, you don't want to use 8 bits for every character because that's wasteful. So the point of Huffman coding is that you can use the best number of bits for each individual character for lossless storage (averaging less than 8, sometimes not less than 8.) When you plan on actually working with the characters, you can convert it back to 8 bit ASCII.
Yes, you are right BUT one of the bits is a parity bit so in a way ASCII uses 7bits to store the characters
i dont understand why other people want to add the fifth frequency to the very first node that got with two letters.
please send the documentation/ppt
Why you don't consider the size of the tree itself in the file, are they will be generated again at decode time?
Thanks
You would normally, but because trees can be represented in a few different ways, and different languages deal with them differently, its size is not something you can know without more information, which is why I leave it out from the equation to simplify it. Anyway, at very large file sizes, the size of the tree becomes very small relative to the rest of the file
Aha so in the end, the compressed file will combine "The Map" and "The Tree" together in one file, that's really interesting !!!
At the end , shouldn't it be 42 - 16 = 26
Noice explanation
I had a question on the method you used to arrange the nodes because I assumed the n and the a node would be switched
It wouldn't make a difference
It seems to me that last character (n) in the "message" (Huffman), could have been encoded as just 1, since it is at the tail end of the message so there is no chance of it being any other symbol that starts with a 1 (such as m = 110). That would have been 2 bits shorter. Huffman codes are NOT ideal (not the shortest coding of symbols). This is a "perfect" example. If a 1 is seen and there are more bits after it, then it would NOT be the letter n, only if a 1 is seen and we run out of data is that 1 = n. We can get away with this because there are no embedded n characters in the message (only at the very end of the message). This alternate shorter coding scheme would thus work.
but that would mean specifiying what the last letter of the message is, so you also have to store the 'n' as is rather than following the tree
@@CunningBard There is only 1 occurrence of n in Huffman so having n be coded as a 1 is fine if the decoder always looks for the longest matching code first (such as u = 101) When the decoder sees a 1, it continues reading more input before deciding what symbol is it, and if that 1 is the last of the input (not already used in another decoding of a previous character), it can then properly deduce it is the letter n (cuz it cannot possibly be anything else such as H, u, or m). Therefore Huffman codes are NOT optimal, I just proved that. A coding / decoding algorithm is legit if there is no ambiguity, and in this example, using 1 = n, there is not, provide the decoder attempts to match the longest possible code first (such as 101 = u), before assuming it is an n followed by an f (which it is not). This is a special case where the last character (symbol) in the message appears only once in the message (which is the case here).
@@CunningBard The tree is basically only used to generate the codes, then they are stored in a table, so we can easily change the table entry of n from 111 to just 1, but as I already stated, then we must also change the decoder to NOT just pick the shortest matching code it seems and interpret it as that character in the lookup table. We could also do this for "a" (changing it from 00 to 0), if we make the decoder "smarter", telling it that if 2 symbols in the message still haven't been seen and it just read the last 2 input bits, those MUST be the last 2 symbols in the message... that is, the final 01 will be interpreted as "an" and NOT "f". So the message will be "Huffman", not "Huffmf". "01" on input can have 2 different meanings but can still be made nonambiguous because of where they can occur. Where they occur gives us more information about them. A "straight" Huffman coding is not as "smart" cuz it doesn't take advantage of the position on input of the 01, it treats them all the same. That is clearly NOT optimal. My solution is NOT optimal either, but it is a slight improvement.
I have one more logical algorithm to compress any file
♥♥♥♥♥♥♥ thank you
i think the value of a and n swap each other
Why, it wouldn't make a difference
Lossless compression is my mum
Ur god
The only drawback is the accent
@BachieGaga 9 dick
crystal clear thanks!