The Beauty of Lempel-Ziv Compression

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ต.ค. 2024

ความคิดเห็น • 132

  • @maxthreshold
    @maxthreshold 3 ปีที่แล้ว +26

    Ngl, I struggled to understand the core concept behind this algorithm. Even after watching so many "technical" videos here on youtube, I still couldn't wrap my head around it. But after watching the first 30 seconds of THIS video, it was crystal clear all of a sudden. I just love how you visually presented the inner workings of this algorithm!

    • @ArtOfTheProblem
      @ArtOfTheProblem  3 ปีที่แล้ว +3

      woo! I'm thrilled to hear this. it's how I felt when I was researching this video.

  • @amankhunt3620
    @amankhunt3620 5 ปีที่แล้ว +47

    Always believed in your channel, first priority while searching for any topic

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +2

      thank you for the kind words, I will continue!

  • @HarryRobins
    @HarryRobins 5 ปีที่แล้ว +40

    This is very interesting. Or should I say, "This 3s very 3n1e8e4t10g" - oh, that hasn't compressed at all, I guess it gets better the more combinations there are.

    • @umnikos
      @umnikos 5 ปีที่แล้ว +2

      0b0e0t3e0r0u0s2z0i0v0c0o0m0p5e7s9o0n
      Huh, this is even longer than if it was uncompressed...

    • @jetison333
      @jetison333 5 ปีที่แล้ว +4

      I'm sure you could very easily construct a message that would be much longer compressed like this. However, the ,longer a message is, the better chance it has that it will have repeating content meaning easier to compress. Also a number has less data than a letter because there's only 9 numbers and 25 letters, so even if the message is the same length it has less information.

    • @inarisound
      @inarisound 5 ปีที่แล้ว +5

      @@umnikos don't forget that every message you type is actually binary and not what you initially see.

    • @umnikos
      @umnikos 5 ปีที่แล้ว

      @@inarisound yes, but then how do you transmit the numbers separately from the actual bits? They're both just 0s and 1s...

    • @HarryRobins
      @HarryRobins 5 ปีที่แล้ว

      I can't decrypt that, so I think you might have done something wrong.

  • @lightyearahead
    @lightyearahead 5 ปีที่แล้ว +10

    Great video as always.
    By the way, yesterday I had my exam. And I wrote answer on LZW compression.

  • @Zeresrail
    @Zeresrail 4 ปีที่แล้ว +6

    That was so easy to understand. Why can't my proffesor be this clear? Thanks a bunch!

  • @KainniaK
    @KainniaK ปีที่แล้ว +1

    Lempel-Ziv is a very old and obsolete technique now that we have middle-out.

  • @alan2here
    @alan2here 5 ปีที่แล้ว +2

    all anchors lost, not doing well since the amputation, my markets are too bad for quotation, everyone else inside giant fish, send help :-/

  • @alurma
    @alurma 5 ปีที่แล้ว +7

    But how do you differ between regular symbols and special code numbers, for example in binary? I tried this method as you descibed but message only got longer: 42 bits from 33 bits. In your example: what if 1 is also a symbol in angiven alphabet?

    • @MusicBent
      @MusicBent 5 ปีที่แล้ว

      The video used A and B for clarity, but everything would actually just be 1 and 0. Characters like a-z, A-Z, 0-9, etc could be encoded, using a simple scheme like ASCII which is still used today.

    • @MufinMcFlufin
      @MufinMcFlufin 2 ปีที่แล้ว +1

      This implementation of the algorithm always results in the transmitted message alternating between an index and a character.

    • @techlon
      @techlon 3 หลายเดือนก่อน

      for very small data sizes compression may yield worse results than not compressing at all. Also data with little to no repetition will compress very poorly

  • @ee6lpzfzj023
    @ee6lpzfzj023 5 ปีที่แล้ว +8

    Great content, keep going! I like the style and how you bring up the philosophical side of information theory in your old videos. Nevertheless, this being more practical, it's equally interesting to watch.

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +1

      Thanks. yes these IEEE videos are very focused on concepts in specific academic papers. However I'm currently planing to do a new video series on AI which will follow the model in my "old" videos. Stay tuned!

  • @justindie7543
    @justindie7543 5 ปีที่แล้ว +4

    Amazing, amazing, amazing. I have always wondered how compression works. Question though, why not send 3 1 (with a space to distinguish from 31) instead of 3A?

    • @MusicBent
      @MusicBent 5 ปีที่แล้ว +1

      Tommy Dies letters, numbers, and spaces are characters, quest like the buttons on your keyboard. They are all represented as bits (0 or 1) in the end. A common and simple translation is ASCII.

  • @MehrshadGolesorkhi
    @MehrshadGolesorkhi 5 ปีที่แล้ว +6

    Great, but jif? Really?

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +5

      That's how the inventor pronounced it, that's how I pronounce it

    • @MehrshadGolesorkhi
      @MehrshadGolesorkhi 5 ปีที่แล้ว +1

      @@ArtOfTheProblem Then, let's agree to disagree :D

    • @Architector_4
      @Architector_4 5 ปีที่แล้ว

      yiff OwO

  • @fallacyz3r0
    @fallacyz3r0 5 ปีที่แล้ว +5

    Wow. Such an amazing explanation. I have an Information Theory exam in a few weeks and this really helped. (Along with your other videos!) Thanks a lot!

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +3

      excellent to hear, I was once in your position wanting this kind of help. please share with others.

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +3

      I remember trying to learn LempZiv and nothing...nothing...worked for me.

  • @JobvanderZwan
    @JobvanderZwan 5 ปีที่แล้ว +1

    Technically the code-book is not empty at the start: it contains the letters. Or at least, the sender and receiver must agree on an alphabet

  • @DecedentoftheDecade
    @DecedentoftheDecade 5 ปีที่แล้ว +5

    Excellently made. Would be thrilled if you uploaded regularly (daily/weekly)

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +2

      Thank you. Monthly would be top speed for my style of production. You can make this a reality by supporting future work here: www.patreon.com/artoftheproblem

  • @GiesbertNijhuis
    @GiesbertNijhuis 5 ปีที่แล้ว +1

    I really enjoyed this video. Thanks for making it.

  • @yomanos
    @yomanos 5 ปีที่แล้ว +5

    This channel has great content. Keep it up Brit!

  • @AntonFetzer
    @AntonFetzer 3 ปีที่แล้ว +2

    Great explanation and example.
    It felt almost like ASMR to hear the algorithm play out in the example.

  • @siennahsu8990
    @siennahsu8990 4 ปีที่แล้ว +2

    The topic is so well-explained. Can't believe the vid doesn't have more views!

  • @Denverse
    @Denverse 4 ปีที่แล้ว +1

    Awesome Explanation!, Thank you!!!

  • @samarthtandale9121
    @samarthtandale9121 10 หลายเดือนก่อน +3

    Wow! what a simple and intuitive explanation, excellent!

    • @ArtOfTheProblem
      @ArtOfTheProblem  10 หลายเดือนก่อน

      thank you glad you found this, was it suggested or did you just discover it? I'm trying to learn

    • @samarthtandale9121
      @samarthtandale9121 9 หลายเดือนก่อน

      @@ArtOfTheProblem I'm ur huge fan bro 🥺 ... Please remember me, I'm gonna meet u 1 day! Keep up the good work 💪😎💯

  • @irvanicsmoke1132
    @irvanicsmoke1132 5 หลายเดือนก่อน +1

    wowwwwww. I love watching this vid. Easy to undestand, clear, and fun! Thank you so much

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 หลายเดือนก่อน +1

      yes I loved making this! i was so confused when I first had to learn it

  • @christianjansson6806
    @christianjansson6806 2 ปีที่แล้ว

    Perfect. So next time I'll send a message containing only two letters, I'll contact you for a smart algorithm for compression!

  • @GegoXaren
    @GegoXaren 5 ปีที่แล้ว +2

    After 5 years since I requested it... Thank you!
    (it was in the comment section of the Makrov Chain videos I asked you to talk about this... Due to LZMA).

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +3

      Amazing...slow but steady :)

    • @MusicBent
      @MusicBent 5 ปีที่แล้ว +3

      Has it really been that long 😮 It’s so great that this channel hasn’t died, or sacrificed quality in the name of quantity!

  • @XanderGouws
    @XanderGouws 5 ปีที่แล้ว +3

    At 8:47, can't we write ABBA as 31, since ABB = 3, A = 1?

    • @MusicBent
      @MusicBent 5 ปีที่แล้ว +1

      Remember computers would be using only 1 and 0 s, not letters. In the video, the letters A and B are used to emphasize the difference between coded and uncoded parts of the message. You could replace all of the As with 0s and the Bs with 1s and it would still work. It just would be less clear in the video.

    • @griggiorouge
      @griggiorouge 5 ปีที่แล้ว

      @@MusicBent ok but then the question would be why didn't they code both the ABB and the A, and send it 31 and not 3A (meaning part coded and part not coded), I amnot sure if the guy making this video did that by mistake or the code has to be that way.

    • @MusicBent
      @MusicBent 5 ปีที่แล้ว

      andres martinez hey, I looked more closely at that part of the video and I guess I’m not sure either. I would assume the video is correct, but not explained clearly...
      The first four messages all are the same as the previous one except the last letter is new. The fifth message breaks this trend. Maybe that has something to do with it...

    • @griggiorouge
      @griggiorouge 5 ปีที่แล้ว +1

      @@MusicBent Hey, if you think about it, it can`t send the code 31 as it would be confused with the 31st sequence... and the same goes for 21 42 etc... so I guess each sequence has to be coded as part code (number) and part not code (letter: or original message).

    • @griggiorouge
      @griggiorouge 5 ปีที่แล้ว +2

      No you can`t because then you wouldn`t be able to use the code 31 for the 31st sequence, and so goes for the 42 41 32 etc... at 10:04 he says:
      "notice the sender is always adding what it hasn`t seen to a growing code book, and then sending that sequence in coded form, MINUS the last letter (!), that`s the trick."

  • @kevinmaison9403
    @kevinmaison9403 5 ปีที่แล้ว +2

    ABBA is 4, Mamma Mia

  • @KishanKumar-mz3xr
    @KishanKumar-mz3xr 5 ปีที่แล้ว +1

    Wow amazing. It was very smooth teaching. Thank a lot.

  • @inarisound
    @inarisound 5 ปีที่แล้ว +8

    Fantastic, now I want to purchase my WinRAR activation

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +2

      Thank you for the feedback on the pacing I appreciate it.

  • @nyscersul42
    @nyscersul42 2 ปีที่แล้ว +1

    You can tell as you were narrating it felt repetitive to use the same words for each step, during the compression explanation, repeatedly. But, it really did help for following the sequence. Another good video. :)

  • @MugiwaraSuponji
    @MugiwaraSuponji 5 ปีที่แล้ว +1

    If this series of videos was a psyop by IEEE Information Theory Society to steal away electrical engineers from other, non-information theory subdomains, well..........it might be working on me! :O

  • @SetMyLife
    @SetMyLife 5 ปีที่แล้ว +1

    Awesome. You had actual A. Lempel as a consultant?

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว +3

      That's correct, we were lucky to have the original authors participate in all the "information age" videos we've done together.

  • @niua511
    @niua511 9 หลายเดือนก่อน

    Amazing! I love your animation and background music. Keep going

    • @ArtOfTheProblem
      @ArtOfTheProblem  9 หลายเดือนก่อน

      thanks! more IEEE videos coming soon

  • @albertologp5184
    @albertologp5184 ปีที่แล้ว

    Perfectly explained video. Congratulations!

  • @grjesus9979
    @grjesus9979 3 ปีที่แล้ว +1

    Wow! I'm amazed for such quality of explanation, thank you!!

    • @ArtOfTheProblem
      @ArtOfTheProblem  3 ปีที่แล้ว

      appreciate the feedback, stick around for more

  • @ashayshirwadkar
    @ashayshirwadkar 5 ปีที่แล้ว +1

    Wow.. Great video. I am intrigued by how Lempel-Ziv came up with this. Simply beauty!!

  • @Enigma758
    @Enigma758 3 ปีที่แล้ว

    You have a talent for stating things clearly.

  • @alurma
    @alurma 5 ปีที่แล้ว +1

    Amazing! I always wondered how zip compression works. Thanks!!! :)

  • @aronpop1447
    @aronpop1447 3 ปีที่แล้ว

    you deserve more subscribers

  • @prateekthakur1347
    @prateekthakur1347 5 ปีที่แล้ว +3

    I love this channel.

  • @-tiktoktrend1212
    @-tiktoktrend1212 3 ปีที่แล้ว +1

    wow, you made it so simple ! very easy to understand, thank you!

    • @ArtOfTheProblem
      @ArtOfTheProblem  3 ปีที่แล้ว +1

      thrilled to hear it, made this video because I hadn't seen any others do it simply

  • @LongfellowBrady
    @LongfellowBrady หลายเดือนก่อน

    92286 Pasquale Camp

  • @ashokrajur09
    @ashokrajur09 5 ปีที่แล้ว +1

    you are a very good teacher.. thanks for educating.. and god bless you to continue the good work

  • @dragonsaige
    @dragonsaige 5 ปีที่แล้ว +3

    Always love the videos. Great work.

  • @mo5ch
    @mo5ch 5 ปีที่แล้ว +1

    If you have the last sequence ABA, why do you replace AB with the value in the lookup table, but not the last A?
    E.g. if we had ABAB, wouldn't we replace it with 22?

    • @WurstRELOADED
      @WurstRELOADED 5 ปีที่แล้ว +10

      The simple answer is "because that's how the algorithm is defined". You always find the longest sequence that is in your table already and add exactly one letter. If you would always compress multiple messages, then you would quickly run into trouble. For example, if your message is ABAABABBBABABA, then after the first two steps, your table would be "A-1, B-2". After that, you could "compress" the entire message as 112122212121. This wouldn't actually save you any space though, because you mapped one letter to one number.

    • @pedrostuginski7616
      @pedrostuginski7616 5 ปีที่แล้ว +2

      Basically because you don't have spaces, so you wouldn't be able to know if you have "two one" or "twenty-one". Remember that in a real system you just have 0 and 1.

  • @paulstelian97
    @paulstelian97 4 ปีที่แล้ว

    Funny how the numbers and letters are then compressed again using Huffman coding, which itself is also very interesting.

  • @nealski26
    @nealski26 6 หลายเดือนก่อน +1

    Great description! Thanks.

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 หลายเดือนก่อน

      New video is up on Evolution of Intelligence th-cam.com/video/5EcQ1IcEMFQ/w-d-xo.html

  • @Bc232klm
    @Bc232klm 5 ปีที่แล้ว +1

    Could you use names other than Alice or Bob?

    • @MusicBent
      @MusicBent 5 ปีที่แล้ว

      cubs0110 Adam and Betty
      Idk, apparently it’s just a tradition in computer / communication science

  • @lucasantosmartins9084
    @lucasantosmartins9084 2 ปีที่แล้ว

    wow thanks

  • @shirinmohebbi8624
    @shirinmohebbi8624 3 ปีที่แล้ว

    thank you

  • @studentcommenter5858
    @studentcommenter5858 5 ปีที่แล้ว

    So basically they are finding prefix codes on the fly and sending them.

  • @sourabhk2373
    @sourabhk2373 6 หลายเดือนก่อน +1

    Thanks!

    • @ArtOfTheProblem
      @ArtOfTheProblem  6 หลายเดือนก่อน

      Thank you kindly! I really appreciate this

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 หลายเดือนก่อน

      Just posted a new vid! th-cam.com/video/5EcQ1IcEMFQ/w-d-xo.html

  • @anaswicaksono1123
    @anaswicaksono1123 ปีที่แล้ว

    this is concise and precise explanation, easy to understand, thanks a lot sir

  • @jaypalsha1701
    @jaypalsha1701 4 ปีที่แล้ว

    nice

  • @youralbaniandad8459
    @youralbaniandad8459 2 ปีที่แล้ว

    thank you you saved my ass

    • @ArtOfTheProblem
      @ArtOfTheProblem  2 ปีที่แล้ว

      happy to help, I remember this crushed me in school

  • @toyodathon08
    @toyodathon08 5 ปีที่แล้ว

    How do you delimit between symbols?

  • @julianmahler2388
    @julianmahler2388 3 ปีที่แล้ว

    Very good explanation, thank you! However if you try the method on "TESTTESTTEST" it will make the dictionary ['T', 'ES', 'TT', 'EST', 'TE', 'ST'] and it'll never find out that compressing it as 3 * 'TEST' would be much more efficient? Or am I missing something?

    • @porglezomp7235
      @porglezomp7235 2 ปีที่แล้ว +1

      Modern LZ approaches are able to have overlapping components-the codes are just described in terms of the previous values, so you could encode that as like
      4,0,8TEST, which says "take 4 literals, and starting from 0 symbols back, copy 8 symbols"
      so it would first write out:
      TEST
      ^
      and then copy 8 symbols,
      after 4: TESTTEST
      ^
      after 8: TESTTESTTEST
      ^

    • @FriedMonkey362
      @FriedMonkey362 ปีที่แล้ว

      T,1E,2S,3T,4,4,4 itls works efriencet and amXing

  • @griggiorouge
    @griggiorouge 5 ปีที่แล้ว

    1 person is nostalgic for the seamen

  • @dyyno5578
    @dyyno5578 2 ปีที่แล้ว

    Great explanation !

  • @fabricioalmeida351
    @fabricioalmeida351 4 ปีที่แล้ว

    Excellent!

    • @ArtOfTheProblem
      @ArtOfTheProblem  4 ปีที่แล้ว +1

      thanks for your feedback, stay tuned for more

  • @chris_1337
    @chris_1337 5 ปีที่แล้ว +1

    Awesome as always

    • @ArtOfTheProblem
      @ArtOfTheProblem  5 ปีที่แล้ว

      Appreciate it, stay tuned for videos on AI

  • @flytape8490
    @flytape8490 5 ปีที่แล้ว

    Do bzip2 next

  • @furkanunluturk7785
    @furkanunluturk7785 3 ปีที่แล้ว

    this video created such a great intuition in my mind. thank you!

  • @quantranminh1781
    @quantranminh1781 3 ปีที่แล้ว

    Best explanation so far. Thank you

  • @prachinainawa3055
    @prachinainawa3055 4 ปีที่แล้ว

    Unique explanation 😍

    • @ArtOfTheProblem
      @ArtOfTheProblem  4 ปีที่แล้ว +1

      thanks prachi, stay tuned for more.

  • @PradeepBanerjeeKr
    @PradeepBanerjeeKr 4 ปีที่แล้ว

    Simply awesome!

  • @himanshukesarwani7446
    @himanshukesarwani7446 5 ปีที่แล้ว +1

    best videos for learning information theory concepts

  • @1schwererziehbar1
    @1schwererziehbar1 5 ปีที่แล้ว

    I am very skeptical of the details of your example.

  • @zxbryc
    @zxbryc 4 ปีที่แล้ว

    so this is LZ78 and not LZ77? as I understand it, LZ77 is the popular/most efficient one.