these compression algorithms could halve our image file sizes (but we don't use them)

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 585

  • @JentacularGent
    @JentacularGent  3 หลายเดือนก่อน +395

    TO CLARIFY: don't be afraid to use these algorithms! the patents for arithmetic coding are all expired, and the only justifiably patentable material in Microsoft's patent are specific implementations of adaptive-probability rANS (which JPEG XL actually doesn't use). Microsoft has also given anyone permission to use it in open source, royalty-free codecs
    at around 7:20, i put "NH(X) = log_2 X" ... H(X) is the entropy of a symbol, not of the code! X on the left is a symbol, and X on the right is the code ... sorry! let me know of any other corrections. and thank you for watching!!
    I saw some mixed reactions to the music. I made a poll about this for future reference, but I don't know how to link it here, so please check it out in my channel

    • @JorgetePanete
      @JorgetePanete 3 หลายเดือนก่อน +5

      Please reupload it fixed.

    • @gardian06_85
      @gardian06_85 3 หลายเดือนก่อน +4

      for the purpose of encoding files even the full ASCII set wouldn't either of these methods fall down because instead of being base 10 we are base 255 converted into base 2. granted the the opcode values would have very low occurrence for a probabilistic approach (unless the files contains assembly) then how do these algorithms deal with character sets like UTF16 or UTF32? wouldn't that imply that even the arithmetic approach would need a given char value to be 32 bits at which point we are in almost non compressed state?

    • @wirelyre
      @wirelyre 3 หลายเดือนก่อน +7

      ​@@gardian06_85
      UTF-8, UTF-16, and UTF-32 are all encodings for the same character set. So you could compress a UTF-32 file by encoding it as UTF-8 and then using one of these methods.
      You could also use 64-bit integers, or even greater bases.
      But most compression formats don't encode bytes directly. They encode symbols that are instructions to the decoder, like a little machine. "Copy the next three bytes to the output stream." "Look back 40 bytes in the output and copy 25 again."

    • @mehmetdemir-lf2vm
      @mehmetdemir-lf2vm 3 หลายเดือนก่อน +4

      please remove background music because it distracts. we can always listen to music while watching videos, but we cannot remove them from videos.

    • @meneldal
      @meneldal 2 หลายเดือนก่อน

      You can also get the code from the h264 or hevc reference encoder (which both should only use expired patents by now) for how they handle the variable probabilities and how they implement it. Basically there's a bunch of bins that store probabilities and they have parameters for a starting probability and how much it should be updated each time.

  • @leeroyjenkins0
    @leeroyjenkins0 3 หลายเดือนก่อน +1685

    0:47 intellectual property law is a completely fair and balanced system with no exploits whatsoever

    • @KillFrenzy96
      @KillFrenzy96 3 หลายเดือนก่อน +375

      Why the hell can you patent something from the public domain? It should never be enforceable, because anyone can claim they obtained it from the original public domain source itself.

    • @ryansullivan3085
      @ryansullivan3085 3 หลายเดือนก่อน +119

      One of the most frustrating things I've ever seen

    • @Gelatinocyte2
      @Gelatinocyte2 3 หลายเดือนก่อน +75

      Spiffing Brit reference detected

    • @JarheadCrayonEater
      @JarheadCrayonEater 3 หลายเดือนก่อน +8

      ​@@Gelatinocyte2someone gets it

    • @viliml2763
      @viliml2763 3 หลายเดือนก่อน +110

      @@KillFrenzy96 They didn't patent it directly, they patented some straightforward expansions and optimizations that virtually everyone would need to use in practice.

  • @timseguine2
    @timseguine2 3 หลายเดือนก่อน +630

    Patents suffer from the problem of trivial extension. The typical thing I have noticed is that you can take a well known thing, not change it at all, and use it in a context it hasn't really been used before. Then patent that.
    I can't patent Huffman Coding? What about Huffman Coding... in space?

    • @TreeLuvBurdpu
      @TreeLuvBurdpu 3 หลายเดือนก่อน +8

      You can't patent things that are in common usage. You have to be the inventor.

    • @anonymousalexander6005
      @anonymousalexander6005 3 หลายเดือนก่อน +114

      @@TreeLuvBurdpunope, as long as you can prove it’s a novel idea, there’s a possibility of getting a patent. It all depends on how good your lawyer is…

    • @timseguine2
      @timseguine2 3 หลายเดือนก่อน +97

      @@TreeLuvBurdpu tell that to the patent officer that approved some of the patents I have been directly or indirectly involved in.
      In theory, completely agree with you. In my experience that is not the case in practice, and the concrete application is more important.
      I haven't read the actual patent myself, but I found out one of the teams I worked with successfully got a patent for using CNNs for image classification in a certain medical context. They didn't invent using CNNs for image classification.

    • @thisisabsolutelystup
      @thisisabsolutelystup 3 หลายเดือนก่อน +33

      Part of the problem is that the standards of whether something can be patented or not is completely variable. In theory it has to be original and demonstrate an inventive step - something not obvious, something the inventor thought of or did.
      So Microsoft taking Huffman encoding and simply applying it to a few common critical structures and patenting that should not pass the inventive step threshold, because it's an obvious application that anyone using Huffman encoding would do.
      Patent law, especially in the software space, needs real reform somehow.

    • @timseguine2
      @timseguine2 3 หลายเดือนก่อน +11

      @@thisisabsolutelystup The huffman encoding in space example wasn't real. I was referencing the "recycled in space" trope. Maybe I am too chronically online.

  • @purplenanite
    @purplenanite 3 หลายเดือนก่อน +721

    If there were enough algorithms to do this with, I could imagine making a series of "Algorithms you can't use"

    • @au-lit
      @au-lit 3 หลายเดือนก่อน +47

      There totally is a lot of them in this situation, though usually there is good alternatives.

    • @Xankill3r
      @Xankill3r 3 หลายเดือนก่อน

      There is a whole book of these and even that just barely scratches the surface. Look up Math You Can't Use by Ben Klemens. It's over 2 decades old now so I'm sure we could easily write several new editions of it with newer patent BS like this one.

    • @beeverfeever4930
      @beeverfeever4930 3 หลายเดือนก่อน +6

      If I had to guess, there most likely would be plenty of algorithms you can't use

    • @arossfelder
      @arossfelder 3 หลายเดือนก่อน +7

      I would not recommend watching this video, it's very dangerous for a developer because it makes you liable if you end up using one of them

    • @DwAboutItManFr
      @DwAboutItManFr 3 หลายเดือนก่อน

      You can use in FOSS projects no?

  • @jeanf6295
    @jeanf6295 3 หลายเดือนก่อน +62

    This a neat video, and an interesting subject, however, it was a bit too fast paced for me. A few could be done differently :
    - Sometimes, you list goals, but do not put those in written format, which makes things harder to follow.
    - Having a synthetic representation of the algorithm as a whole could also help a lot, seeing the states of the variables evolve in real time is neat, but insufficient.
    - Getting the meaning of abstract notations takes some time, using a more explicit naming convention could help.

    • @raffimolero64
      @raffimolero64 2 หลายเดือนก่อน +8

      I agree. I couldn't follow along as well as I'd have liked. There were so many symbols with single letter names and mentions of technical terms without accompanying visualizations. In an ideal world where the editor never gets tired and never has to take breaks, every variable in the equation must constantly have its definition somewhere on-screen. either as text or as an image explaining what it is.
      But I do really appreciate the fact that the video exists at all. It's a lot more digestible than... papers.

  • @LuveelVoom
    @LuveelVoom 3 หลายเดือนก่อน +136

    jarek duda: check out this really cool compression algorithm I made! it’s free and unpatented and twice as good as Huffman coding.
    Microsoft: check out this really cool compression algorithm I totally didn’t steal from jarek duda! it’s proprietary and patented and twice as good as Huffman coding.
    patent office: yeah, we don’t have it on file. Here’s your patent, Microsoft
    jarek duda:

    • @jb31842
      @jb31842 3 หลายเดือนก่อน +24

      The "I made this" meme would have encoded the same information but vastly shorter

    • @LuveelVoom
      @LuveelVoom 3 หลายเดือนก่อน +39

      @@jb31842 Unfortunately it was patented by Microsoft.

  • @safebox36
    @safebox36 3 หลายเดือนก่อน +268

    It's actually being used in JPEG XL.
    The problem is so few programs support it despite its backing from every major web browser, photography company, open source organisation, and cybersecurity group.
    Cuts the size of PNGs in half in most cases while still retaining lossless quality.

    • @keisisqrl
      @keisisqrl 3 หลายเดือนก่อน +95

      Google pulled JPEG XL from Chromium in favor of their own webp which effectively killed it as a web format, Mozilla and WebKit (Safari) followed suit.

    • @JohnDoe-jk3vv
      @JohnDoe-jk3vv 3 หลายเดือนก่อน +7

      JPEG XL is not lossless

    • @darthjaffacake8573
      @darthjaffacake8573 3 หลายเดือนก่อน +61

      It can be, just depends on settings

    • @JohnDoe-jk3vv
      @JohnDoe-jk3vv 3 หลายเดือนก่อน

      @@darthjaffacake8573 I was wrong

    • @connor_santoyo
      @connor_santoyo 3 หลายเดือนก่อน

      ​​@@keisisqrl Google killed JPEG XL in favor of AVIF, not WebP. WebP is no longer being developed. Mozilla did not kill off JXL either, it is still available as a nightly flag. And Apple has full support for JXL across all their OSs including WebKit.

  • @Manabender
    @Manabender 3 หลายเดือนก่อน +70

    6:13 Took me a second to figure out how clever this was.
    Assuming you're playing a game of hangman in which the word to be guessed is an actual, English word, it could be plane, plant, plank, or plans. In that case, saying that the last letter is a vowel is worth two bits of information (as it divides the space of valid possibilities by 2^ *2* ). Saying that the last letter is a letter is worth zero bits, given assumptions.

    • @0LoneTech
      @0LoneTech หลายเดือนก่อน +3

      There's actually an input method based on this, called Dasher. It's literally harder to misspell words in.

  • @Anon-do7lo
    @Anon-do7lo 3 หลายเดือนก่อน +25

    My data compression professor last semester made really good points about how Huffman still has a place. It’s just a substitution thing and you can simply read the encoded text if you have the scheme. It’s a lot more performant when it comes to decoding.

    • @jursamaj
      @jursamaj 2 หลายเดือนก่อน +2

      Yes, the best compression schemes have multiple algorithms, and choose the appropriate one. Nobody is saying to just throw Huffman away.

    • @mage3690
      @mage3690 หลายเดือนก่อน

      I love asymmetric compression schemes personally, but the wiki claims that fractal compression has largely been abandoned because it became clear that fractal compression would always be very asymmetric.

  • @Xankill3r
    @Xankill3r 3 หลายเดือนก่อน +263

    We need another edition of Math You Can't Use by Ben Clemens just to cover this BS. Well, I mean we probably need a reconsideration of the patent regime first. Software - being just maths - should *not* be granted patents at all. If the whole idea behind patents was to foster innovation then "we've been sending completely unnecessary amounts of data over the internet because patent trolls" is definitely evidence that it doesn't work that way.

    • @mytech6779
      @mytech6779 3 หลายเดือนก่อน +7

      Most of the time software cannot be simply patented, however source code is covered by copywrite. They are not the same thing. There are a few exceptions but you need to dig into the specific cases to have any meaningful discussion of that area.

    • @Xankill3r
      @Xankill3r 3 หลายเดือนก่อน +22

      @@mytech6779 copyright doesn't matter here since we are talking about patents. Copyright also only applies to the specific implementation - or expression - of an idea. Therefore although there are problems with copyright as well it doesn't at least give someone a complete right over an idea.

    • @wb3904
      @wb3904 3 หลายเดือนก่อน +2

      Maths and algorithms are fundamentally different. Computers have more in common with logic than maths.

    • @Turalcar
      @Turalcar 3 หลายเดือนก่อน +15

      @@wb3904 logic is maths

    • @wb3904
      @wb3904 3 หลายเดือนก่อน +3

      @@Turalcar no maths is based on logic not the other way around

  • @arduano
    @arduano 3 หลายเดือนก่อน +11

    Thank you so much for sharing this! I was implementing LZMA earlier this year (for fun, and to learn how it works), and when I got to LZMA's range encoding component I realized how genius this approach is. It took me days to understand, but your video would've helped so much if I was around haha

  • @denislamarche224
    @denislamarche224 3 หลายเดือนก่อน +40

    I can't tell you how far above my head this video is....but it approaches infinity.

    • @aceman0000099
      @aceman0000099 3 หลายเดือนก่อน +5

      You could probably understand it if it was something physical like sorting books and you did it as a job for about an hour straight

  • @anon_y_mousse
    @anon_y_mousse 3 หลายเดือนก่อน +38

    I'm of the opinion that software shouldn't be patentable, but also that copyright should expire after no more than 28 years, although 14 would be preferable. Years ago, back when I was in middle school, I remember playing with a compression program written in QBasic which used arithmetic coding. I thought it was pretty cool, but at the time didn't know how to use it. That was significantly longer than 17 years ago now, and even if the idea were capable of being patented, it definitely should no longer be as 17 years is the limit for a patent. Of course, I also think that our patent system needs to be fixed in more ways than one, including that people with technical knowledge should be the ones to review patents in technical fields and the law, which gives us the criteria that they should be novel, should actually be applied and prevent patent squatting as MS, and many other corporations acting in bad faith, have done.

    • @peceed
      @peceed 18 วันที่ผ่านมา

      No, we need just remove patent laws. There will be inventinve AI that can solve "on flight". There is no reason to give right for ideas to the first organization that will run the search.

  • @Chalkadoo
    @Chalkadoo 3 หลายเดือนก่อน +127

    So what I got from this video is that we need patent law reforms and also Microsoft needs to get its shit kicked in REALLY hard?

    • @DiThi
      @DiThi 3 หลายเดือนก่อน +39

      Better: software patent abolition. We don't have them in Europe and we don't want them.

    • @i-am-linja
      @i-am-linja 3 หลายเดือนก่อน +1

      @@DiThi Nice, but is there a way to _prevent_ an American company from patenting your idea?

    • @i-am-linja
      @i-am-linja 3 หลายเดือนก่อน +1

      @@DiThi Or failing that, to prevent said patent being enforced in your jurisdiction?

    • @DiThi
      @DiThi 3 หลายเดือนก่อน +10

      @@i-am-linja I thought prior art would be enough, and maybe it's still enough and the patent for ANS wouldn't really hold in court, who knows. Patents are not enforced in my jurisdiction, they can't. However we live in a global world and using US patented algorithms would prevent me from publishing my software in the US.

    • @stdcall
      @stdcall หลายเดือนก่อน

      hom osuck

  • @wikiPika
    @wikiPika 3 หลายเดือนก่อน +195

    1:25 "hello, how are you? fine, thank you"

    • @gamerdomain6618
      @gamerdomain6618 3 หลายเดือนก่อน +50

      oh my gaah!

    • @puppergump4117
      @puppergump4117 3 หลายเดือนก่อน +2

      I know that from a meme someone sent and now wonder how relevant it is to the world

    • @froxcey
      @froxcey 3 หลายเดือนก่อน +26

      Hello everynyan

    • @4.0.4
      @4.0.4 3 หลายเดือนก่อน +15

      ​@@puppergump4117the anime was surprisingly influential in that it also coined the word "waifu" from one teacher saying "mai waifu".

    • @xhbirohx2214
      @xhbirohx2214 3 หลายเดือนก่อน +1

      im out of the loop, can someone fill me in?

  • @dumdum7099
    @dumdum7099 3 หลายเดือนก่อน +63

    This is way above my paygrade. I need more readings before I watch this video.

  • @dl5244
    @dl5244 2 หลายเดือนก่อน +7

    I remember seeing a demo for fractal encoding instead of JPG in the early 1990's in the days where 14.4kbps modems were still very expensive so I only had 2400 baud dialup. The demo was a real-estate website with pictures of a houses. Interlaced jpg images took a few minutes to download and view, where as the new fractal format was

    • @Tumbolisu
      @Tumbolisu 2 หลายเดือนก่อน +3

      Even with today's internet speeds, I sometimes end up having to deal with ridiculously slow connections, ranging from minutes to hours. Genuinely unable to look at pictures sent via a specific service because it loads 10 rows of pixels per second. Compression like that should be a priority for all online services, and there is no excuse for CPU load being high, considering how stupidly fast they are now.
      Oh well, how else are they gonna get us to pay more for "faster" internet?

  • @DumToasty
    @DumToasty 3 หลายเดือนก่อน +15

    1:23 Azumanga Daioh reference 「I wish I were a bird.」

  • @theparadox42
    @theparadox42 8 วันที่ผ่านมา

    I remember you way back on the days of KA. Awesome to see what you're making now. Keep it up!!

  • @el_quba
    @el_quba 3 หลายเดือนก่อน +98

    This is an excellent explanation no doubt, but I must be honest I find it very hard to follow and understand

    • @kitamashi
      @kitamashi 3 หลายเดือนก่อน +38

      It is only a good explanation if you are familiar with the jargon, in other words, it is a bad explanation.

    • @jmsether
      @jmsether 3 หลายเดือนก่อน +15

      It's basically an academic paper in video form. I found the information to be well laid out and easy to follow.

    • @atiedebee1020
      @atiedebee1020 3 หลายเดือนก่อน +13

      If its hard to follow and understand, its not a good explanation

  • @sniper9143
    @sniper9143 3 หลายเดือนก่อน +6

    I recently used an adaptive Arithmetic encoder and a LSTM for a custom compression algo, the arithmetic encoder was cool to work with

  • @kristianTV1974
    @kristianTV1974 3 หลายเดือนก่อน +454

    Typical fucking Microsoft, trying to patent something in the public domain.
    God how I hate that company.

    • @crix_h3eadshotgg992
      @crix_h3eadshotgg992 3 หลายเดือนก่อน

      @@koghslook up the Microsoft hate website. All tech corporations are dog shit evil.

    • @serkandevel7828
      @serkandevel7828 3 หลายเดือนก่อน +49

      ​@@koghs Microsoft is notorious for being patent-trolls towards FOSS

    • @PefectPiePlace2
      @PefectPiePlace2 3 หลายเดือนก่อน +80

      the problem is copyright/trademark/IP law, Microsoft is a symptom
      you should blame the root cause if you want the problem to be fixed

    • @4.0.4
      @4.0.4 3 หลายเดือนก่อน +14

      It's usually said of journalists but, "you don't hate Microsoft enough" 😂

    • @AlexGeek
      @AlexGeek 3 หลายเดือนก่อน +5

      They used to do that a lot before. But now they try to compete more with online services.

  • @ADRIANO782
    @ADRIANO782 หลายเดือนก่อน +1

    To be honest this should've been an hour long video, it really went over my head

  • @koka3243
    @koka3243 2 หลายเดือนก่อน +1

    Entropy coding explained in 5 mins. Must be a new record 🎉

  • @Flourish38
    @Flourish38 3 หลายเดือนก่อน +59

    Saw some comments complaining about the music, and just wanted to say that I disagree completely. Having some music is nice, and the choices you made are very low energy and out of the way anyways.

    • @yuriiknapik7241
      @yuriiknapik7241 3 หลายเดือนก่อน +4

      Yeah i can see your point.Maybe he could sidechain the music so when he speaks the music dives in volume more.

    • @gnt10
      @gnt10 หลายเดือนก่อน

      I think its fine as well, though it would be nice to have an option on yourube to turn off the bg music if you have your own prefered soundtrack, though the video would need to made withouth merging the soundtracks making the file a bit bigger😅
      I dont think that should be a huge isue🤔

  • @jb2170-mathmo
    @jb2170-mathmo 3 หลายเดือนก่อน +2

    This video is like the perfect counterpart to my #SoME entry :D
    Huffman Trees are the one part I left out of my vid, this is awesome

  • @Mnnvint
    @Mnnvint 2 หลายเดือนก่อน

    This was a heroic effort at an explanation video, but it should really have been a 10 video series at least.

  • @randomlegodev
    @randomlegodev 3 หลายเดือนก่อน +20

    love me a good encoding algorithm breakdown 🔥
    animations lookin snazzier

  • @mind_11
    @mind_11 3 หลายเดือนก่อน +2

    I appreciate the many funny example words you used. Very epic video! Awesome visuals and interesting explanation!

    • @simonwillover4175
      @simonwillover4175 3 หลายเดือนก่อน +1

      It was a pretty serious video. None of the examples are funny.

    • @Martin-km4gb
      @Martin-km4gb 3 หลายเดือนก่อน

      ayaya

  • @Shirani007
    @Shirani007 2 หลายเดือนก่อน

    What a wonderful presentation and justification of ANS you have posted. I loved watching it, although speech is a bit faster, I used .75 speed 😊. I would reccomend you create more videos explaining other compression algorithms too, which in fact have arithmetic roots. Highly recommend and highly appreciated dear !
    Onw video I would want can be on different methods of approximation, and second on almost all, mostly forgotten bitcodes generation methods in 1 go. Also error correction, hamming codes, reed Soliman codes too , etc.
    Looking forward to more content ❤

  • @kuroclown
    @kuroclown หลายเดือนก่อน

    3 minutes into the video and i knew.....my brain is smooth, very smooth

  • @LovelyBozo
    @LovelyBozo 3 หลายเดือนก่อน +10

    stuff like this is why I'm glad I took AP stats so I can come back and understand what I'm listening to later on this year

    • @tandemdwarf745
      @tandemdwarf745 3 หลายเดือนก่อน +4

      I got a 5 on AP stats and I'm pretty lost lol. This video is less stats heavy, at least the stuff you learn in AP, and more computer science jargon heavy. It also just moves uncomfortably fast for the stuff I have some handle on.

  • @neogen23
    @neogen23 2 หลายเดือนก่อน

    I stumbled upon ANS for the first time in this video, so thanks, but I didn't understanding it at all. I had to watch Stanford's EE 274 on the subject, where it seemed ... approachable, at least. Not that a video of this length was going to fully explain such fairly advanced topics, the pace is quite fast, but it was an excellent introduction to the subject, and it motivated me for further research, so goal achieved I guess. Also the effort in making it really shows.

  • @leeroyjenkins0
    @leeroyjenkins0 3 หลายเดือนก่อน +39

    Slight gripe, at 9:00 it was a bit confusing that you described the operations as "outputting a bit representing which half we're in" and "centering and renormalizing" instead of just saying we want to find a number within the range, that halving is just how decimal binary works, and briefly pointing out why the "centering and adding 0s" works.
    Otherwise interesting video, and cool subject, thanks!

    • @MAD-SKILLZ
      @MAD-SKILLZ 3 หลายเดือนก่อน +6

      Yeah, he lost me at that part, too.

    • @k1ll3trs86
      @k1ll3trs86 3 หลายเดือนก่อน +2

      I can see the intention of the video
      But what he doesn't realize is that thinking in whole probability is not functional, that's not how machine works and it's ultimately extremely inefficiency.
      It's a classic "looks good on paper" but in practice you are trying to force something that doesn't fit

    • @Kynatosh
      @Kynatosh 3 หลายเดือนก่อน +1

      Took me some time wonderi'g what he was getting at but it works I guess
      I understood at the end of his explanation that it was just writing the number in base 2

    • @anonymousanon4822
      @anonymousanon4822 3 หลายเดือนก่อน +1

      ​@@k1ll3trs86I don't understand what you're trying to say. This is not only good on paper but good in practice too. It is literally strictly better than huffman and huffman is used everywhere.

    • @ZeroPlayerGame
      @ZeroPlayerGame 3 หลายเดือนก่อน +1

      @@anonymousanon4822 I think what's meant is that keeping the whole real number in memory and then writing it out in binary is a bad idea (because doing math on long real numbers is a bad idea).

  • @AhrkFinTey
    @AhrkFinTey 3 หลายเดือนก่อน +1

    Nice to see someone who appreciates the second Arabesque!

  • @Psychx_
    @Psychx_ 3 หลายเดือนก่อน +10

    There is a FUSE file system called PiFS. It can store data using some sort of arithmetic coding where the resulting number itself gets compressed as an offset + length into the unlimited digits of Pi.

    • @vurpo7080
      @vurpo7080 3 หลายเดือนก่อน +4

      Relying on the assumption that pi is a normal number... which is probably true but not proven!

    • @BridgeBum
      @BridgeBum 3 หลายเดือนก่อน

      ​@@vurpo7080That would be one hell of a bug to run into some day.

    • @ThylineTheGay
      @ThylineTheGay 3 หลายเดือนก่อน

      @@vurpo7080 even if it's not, you can just split the data you want to add, might even act as a kind of data de-duplication

    • @ThylineTheGay
      @ThylineTheGay 3 หลายเดือนก่อน

      i would genuinely like to see someone make a semi-serious version of that

  • @MadocComadrin
    @MadocComadrin 3 หลายเดือนก่อน +1

    For the folks interested in functional programming or calculating correct programs, Richard Bird's "Pearls of Functional Algorithm Design" has two chapters on arithmetic coding, one on the rational number variant and one on the integer variant.

  • @thedeemon
    @thedeemon 2 หลายเดือนก่อน

    I actually used both algorithms in my lossless video codec - ScreenPressor. Initially it was range coding (arithmetic coding with integers) but it involved division which isn't very fast. Then later version used rANS which is faster but requires compressing data in reverse order. The end result was pretty great: best in class compression and good speed (when the content is not too complicated, as in typical screen data).

  • @fuzzy-02
    @fuzzy-02 2 หลายเดือนก่อน

    I understood about 25% of the video on my first try!

  • @4.0.4
    @4.0.4 3 หลายเดือนก่อน +13

    I'll be honest it was hard to follow, maybe I'm just dumb 😅
    The music was fine imo, but if people complain you can "duck" it (the volume lowers on speech).

    • @jeffreychandler8418
      @jeffreychandler8418 3 หลายเดือนก่อน +7

      no you're not dumb. This video seemed to be made for an audience that already knows compression algorithms and information theory at a competent level.

    • @ArnaldurBjarnason
      @ArnaldurBjarnason 3 หลายเดือนก่อน +5

      @@jeffreychandler8418 Not even, I'm pretty familiar with many of the concepts but he's just running through the ideas with no space to breathe or re-emphasis on the critical aspects.

    • @jeffreychandler8418
      @jeffreychandler8418 3 หลายเดือนก่อน +1

      @@ArnaldurBjarnason That's validating to hear xD I usually ascribe misunderstandings as a personal issue than a communicator issue (unless it's REALLY bad like the Zion NPS instagram or something like that).
      As I think about this video again, honestly a two minute succint summary of existing algos would be great, then trim the fat on the new algos to make those simpler, and allow more breathing space.

  • @05degrees
    @05degrees 3 หลายเดือนก่อน +10

    Haven’t watched yet but there’s a neat thing: arithmetic coding, or a closely related thing, allows one to convert a sequence of fair N-sided coin outcomes to any other discrete uniform distribution without rejection, though if you don’t want to reject you’ll need unbounded memory, and if you want to flush time to time, you’re effectively rejecting some information you’re getting from the initial sequence. But at least you can fine-tune how much you want to reject/save and change it over time!
    Maybe it’s already in the video but if it’s not, here. Also it’s as if there’s some duality about entropy generated by rejection, non-uniformness introduced if we decide to reject only a given number of times and then be complacent, working memory etc.. I don’t know in what kind of inequality this can be written but I hope information/coding theorists have it already discovered and generalized similarly to what happened to the uncertainty principle (in QM and Fourier analysis-now there exists a general operator algebra approach IIRC).

  • @RichardBronosky
    @RichardBronosky 2 หลายเดือนก่อน +1

    11:56 took me [I stopped counting at 6] times listening before I could hear "extra digit".

  • @JTCF
    @JTCF 3 หลายเดือนก่อน +9

    I am usually not in the "wow this is so cool but I didn't understand anything" group, but I think I might be with this vid. The video is paced pretty fast, if I wanted I would probably rewatch it. Currently I just feel like falling into a rabbit hole (which happens to me quite a bit) but not keeping up with the falling.

  • @Seedwreck
    @Seedwreck หลายเดือนก่อน +1

    I recommend anybody using this method to also implement try to implement more layers, compressing with algorithms and compressing neighbour pattern delimiters with some form of character.
    This will greatly improve the compression and complexity of your data!

  • @Cl0udWolf
    @Cl0udWolf 3 หลายเดือนก่อน +1

    Great manim animations. It’s a shame ANS isn’t as widely used. There are similar patent issues with lossy compression algorithms as well, but those are even more complex since there are two vectors of optimization (compression and reconstruction accuracy) where the second does not have a concrete way to measure it.

    • @Aera223
      @Aera223 3 หลายเดือนก่อน

      There are some synthetic benchmarks, like 7zip used to show programs compressed with zip vs 7zip. A standardised benchmark may be useful, though compressors could overfit for that benchmark and look better than they really are

    • @Cl0udWolf
      @Cl0udWolf 3 หลายเดือนก่อน

      @@Aera223 well zip and 7zip are still lossless, but as for lossy a benchmark doesn’t exist for that reason. Some of the reconstruction quality will be subjective. PSNR vs Compression Rate is the most common metric but PSNR can be high while having a very noisy output, or a singular spike of error, as long as it averages out. Some situations lossy compression will have rules for the application such as not deviating more than X amount, or always producing smoothed results

  • @rizqiv2
    @rizqiv2 2 หลายเดือนก่อน +46

    i dont understand anything :(

    • @gnt10
      @gnt10 หลายเดือนก่อน +1

      Same, i have no idea how to code this at all, or if it even makes sense😔

    • @rizqiv2
      @rizqiv2 หลายเดือนก่อน +1

      @@gnt10 i even master on some coding skill but still not understand :D

    • @LzkLdg
      @LzkLdg หลายเดือนก่อน +2

      I found the comment where I belong. 😊

    • @happyatheists9361
      @happyatheists9361 19 ชั่วโมงที่ผ่านมา

      😁

  • @lnee
    @lnee 2 หลายเดือนก่อน

    I realize I came up with the arithmetic numbering system algorithm when I was 14 while creating a chess compressor. Literally that algorithm.

  • @joshuabrown4952
    @joshuabrown4952 3 หลายเดือนก่อน +3

    I hope some google/amazon data centre engineers are taking notes, if they can hear the video over the water pumps out there in the Nevada desert.

  • @quintopia
    @quintopia 3 หลายเดือนก่อน +3

    When i implemented arithmetic coding a few years back, i had no idea of the fraught patent history. Nice bit of info.

  • @__MINT_
    @__MINT_ 3 หลายเดือนก่อน

    Quite interesting subject, I didn't treat probability as something related to my interest field, until now. A year ago I started writing my own lossy audio codec, which is done for now, but I didn't implement any kind of entropy coding. I tried Huffman, RLE, but only thing I got was bigger file size and increased complexity. The problem with these algorithms is that there must be a defined set of symbols and a strong pattern in encoded message, otherwise it won't work. Using raw binary sequences as symbols doesn't really work, since there is quite a lot of them and probabilities for each other sorted in decreasing order are far away from shape of 1/x function. But the only pattern I saw were groups of 0's with less 1's in between. I still can't understand how MP3 pre-processes values so that they are suitable for entropy coding. By saying values I mean quantized frequency coefficients after transform from time to frequency domain, MDCT in my case. After psychoacoustic analyzis I know which coefficients can be discarded and what precision to use for the rest of them. Coefficients are divided into subbands with fixed size, let's say 16, and then they are quantized with different precisions. Single frame can have subbands quantized with both 2 bits and 16 bits, although now I know that 16 bits was way too much even for almost lossless compression. Each subband has its scalefactor field and a precison field, saying how many bits are used per coefficient and what scalefactor was used to divide values to quantize them. Quantized values can be basically anything, I didn't see any strong pattern there. From what I heard, MP3 uses more or less fixed precisions for coefficients, and maybe that helps to entropy code them. But in my codec everything is dynamic and bitrate can vary from for example 20 kbps to 400 kbps for the same file and same quality setting, 20 kbps would be used to encode a single tone, while 400 kbps would be needed for a drum hit. Even quantized values are not fixed-length, for example with 3-bit precision, 0 would be just 0 in binary, 1 would be 100, 2 is 101, -1 is 110, -2 is 111. Scalefactors and precison fields also have variable lengths and are differential-coded. There are also magic values for those fields which can specify number of empty subbands, so few tens of empty coefficients can be stored on a few bits. And these aren't all clever mechanisms that I used instead of entropy coding. But I'd like to understand how to implement this coding for audio / how to prepare values for that coding. With that and some other changes I could create even better codec, which already outperforms MP3 a bit. It's called DAC (Dynamic Audio Codec), and I have a video about it. Unfortunately in polish, but today it shouldn't be a problem. I also shared the working codec (browser it all you need to run it), so take a look if you want. Anyways, thanks for the video and I'm waiting for the next one!

    • @__MINT_
      @__MINT_ 3 หลายเดือนก่อน

      Oh, and there is a link to a video about DAC. YT sometimes removes such comments, so I'll share just the video ID: Qy_AinciI9Y

  • @fedang
    @fedang 3 หลายเดือนก่อน +1

    Imagine leaving your hard work public for the sake of humanity and greedy corporation takes credit for it and ends up forgetting it in their patent drawer

  • @michaelbuchholz992
    @michaelbuchholz992 2 หลายเดือนก่อน

    I first heared of this about 30 years ago and i was so excited that i wrote it in C.
    I can acknowledge, it's compressing better than huffmann, but twice? Maybe with very specific data.
    But it was quite slow. Most probably because i wrote the adaptive version.
    It should be written in assembly, because it has some very special needs that no higher programming language provides. And writing a workaround to get that missing feature in C is a pain in the afternoon and also slow.
    Maybe using libgnump will be helpfull, if you don't want to use assembly - libgnump is written in assembly, as far as i know.

  • @WiseWeeabo
    @WiseWeeabo 3 หลายเดือนก่อน +15

    literally anything can be interpreted as "a single number"..

    • @ZeroPlayerGame
      @ZeroPlayerGame 3 หลายเดือนก่อน

      that's pretty much the point of the video yea

    • @Mnnvint
      @Mnnvint 2 หลายเดือนก่อน

      @@ZeroPlayerGame Find the probability of everything, and interpret "1" as the most probable thing, "2" as the second most probable thing, etc. Voila, optimal compression! there may be some slight implementation details to be worked out...
      Seriously though, one thing I found out many years ago which stuck with me, is that most compression algorithms don't use the whole number line this way. In fact, they only use a tiny fraction of the numbers on the number line. Case in point: A random file is practically never a valid zip file, not even if you strip away headers and other format bookkeeping stuff.
      Even arithmetic encoding and ANS (as described here) leave numbers on the table, so to speak. Mostly due to end handling. Defining a bijective compression scheme (where every sequence of bits decodes to just one message, and all messages can be encoded) is fiddly and most compression people don't see the point. But I think such algorithms are beautiful.

    • @ZeroPlayerGame
      @ZeroPlayerGame 2 หลายเดือนก่อน +1

      @@Mnnvint what's "2"? What's your alphabet? Huffman is bijective. Arithmetic coding is bijective if you encode a whole number of bits (might not be the case). The number line is a mathematical abstraction and doesn't exist in the physical world where encodings have to live.

    • @Mnnvint
      @Mnnvint 2 หลายเดือนก่อน

      @@ZeroPlayerGame Find the probability of the most likely message, and assign it to "1" on the number line. Find the second most likely message, and assign it to "2". There is no alphabet, we deal with the whole message at once.
      "Encodings" are a mathematical abstraction too, so they certainly don't have to live in the physical world if numbers don't :) Personally I like mapping things to the natural numbers/positive integers.
      I specifically said arithmetic encoding is not bijective *as described in this video*. There exist bijective arithmetic encoding implementations. There even exist bijective variants of Huffman, though that should probably be considered a variant and not just a different implementation.
      Bijective compression has a special meaning. It doesn't just mean that there's a bijection between messages and encodings, that's a much lower bar (although most compression methods don't meet that either). Bijective compression means that in addition to satisfying this weaker constraint, every sequence of bytes is a valid file in the compression format. There can be no "unexpected end of file" or other format errors, any sequence of bytes has to decode to a unique sequence of bytes.

  • @jazzerbyte
    @jazzerbyte 3 หลายเดือนก่อน +6

    I wonder why Apple rolled out HEIC instead of just using arithmetic compression, considering it would already be incompatible with standard JPG?

    • @vylbird8014
      @vylbird8014 3 หลายเดือนก่อน +11

      There is a technical and a business reason.
      The technical is that HEIC - like the rival AVIF - goes far beyond just arithmetic compression. It's really an adaptation of techniques developed for video compression, and will wipe the floor with JPEG, even arithmetic-compressed JPEG.
      The business reason is that HEIC, being based on the h265 video codec, is a heavily patented format - and one of the contributors to that patent pool happens to be Apple. Not only do they get to use it at a reduced licencing cost, but whenever another company wants to use HEIC, Apple gets a cut of the royalties. It's clear that JPEG is going to decline, and there are a number of formats competing to replace it - it's in Apple's best interests to make sure that the format they stand to profit from is the one to achieve dominance, and making HEIC the default format for taking photos on iPhones and iPads gives it a huge advantage - every photo-sharing service needs to at least support uploading HEIC, or else Apple users will find their photos won't share.
      There's a rival format, AVIF, which is supposed to be patent-free (you can never be sure) and pushed by a rival consortium, with Google as the major backer. They aren't trying to score any patent money, but that doesn't mean it's entirely altruistic: Their aim is to promote an open and unpatented format in order to avoid having to pay up royalties to the likes of Apple.

    • @liam3284
      @liam3284 2 หลายเดือนก่อน +2

      because they have a financial interest in it

  • @mullanalle4318
    @mullanalle4318 2 หลายเดือนก่อน

    Really well done video. Thank you!

  • @marcfruchtman9473
    @marcfruchtman9473 3 หลายเดือนก่อน

    It is important to understand that the Huffman Encoding requires that it build a "dictionary" (Huffman Tree) with the encoding and that must be included in the file so it increases the size of the compressed file. Additionally, the "encoding" used by Huffman is "Prefix-Free", meaning that no symbolic encoding can be a prefix for any other encoding used in the compression dictionary. It might have been one of the reasons why people were having trouble following. Maybe add the tree and prefix free quality in your next revision.

  • @sasjadevries
    @sasjadevries หลายเดือนก่อน

    Interesting topic, good explanation, nice animations, and a special thanks to Frédéric Chopin.
    TH-cam, is as bad as the US patent office, and doesn't recognise all authors of the music. YT didn't recognise Satie either.

  • @isidor37
    @isidor37 2 หลายเดือนก่อน +2

    This is an interesting topic and discussion, but the presentation and explanations were really hard to follow. The music was *mostly* unobtrusive, but there were a couple points where it really came into the foreground due to the tonal content and became distracting

  • @dadamaldasstuff1816
    @dadamaldasstuff1816 3 หลายเดือนก่อน +45

    You can use the JUSDTB compression (JUSDTB stands for Just Use Smaller Data Types Bro)

    • @dadamaldasstuff1816
      @dadamaldasstuff1816 3 หลายเดือนก่อน +14

      This doesn't actually exist, it's a joke.

    • @wall-e5869
      @wall-e5869 3 หลายเดือนก่อน +6

      this is buhdj
      bro
      ur
      hillarious
      dad
      jokes

    • @dadamaldasstuff1816
      @dadamaldasstuff1816 3 หลายเดือนก่อน +15

      I'm pointing at YOU, database users! You should know the data types and use them accordingly.
      You don't need a string to store one of 3 states, use an enum.
      You don't need an int for a number between 0 and 100, use a byte. It's 4x smaller.
      You should always use unsigned if you don't need negatives. It doubles the positive range.
      Also use decimal, it can store decimal places more precisely than float. If you don't have a horrendously huge range, just use it.

    • @ThylineTheGay
      @ThylineTheGay 3 หลายเดือนก่อน +1

      I prefer πFS personally

    • @Kynatosh
      @Kynatosh 3 หลายเดือนก่อน

      ​​​@@dadamaldasstuff1816wdym? A string can be two bytes of it holds just one char (and the null char appended). An enum is in most languages an integer, which can be 32 or 64 bytes and use up more space. I'm sure the db stores it at one byte though
      I still agree that you should always use the right type though. Even helps you catch bugs early when you get screamed at for having the wrong type

  • @CSMHD
    @CSMHD 2 หลายเดือนก่อน

    thank you great insight - good explanation - there were some wow moments - thanks

  • @teltubes1
    @teltubes1 3 หลายเดือนก่อน

    Great animations. I understood nothing about the proof but the examples are nice. Is there a specific software you are using for this presentation?

  • @MegaArti2000
    @MegaArti2000 หลายเดือนก่อน

    Nice video! Just one suggestion: don't let the video black-out for so long while you're talking. I was confused thinking something was wrong with the image, because you were still talking regardless hahaha

  • @puppergump4117
    @puppergump4117 3 หลายเดือนก่อน

    I think you'd make a great youtuber, depending on how long it took you to make this. If you keep the style and make some in-depth videos on things where online resources suck donkey like shaders or common library implementations you'd probably hit 100k before the year ends.

  • @TreeLuvBurdpu
    @TreeLuvBurdpu 3 หลายเดือนก่อน +5

    I'm stuck on the variable length encoding in your explanation. "Shorter codes encode more frequent symbols". How do you indicate the length of bits that should be decoded into a single symbol? Presumably, short codes would exist inside longer encodings, for instance "101” is inside " 1011”, and "10” is inside ”101”.

    • @inanefool8781
      @inanefool8781 3 หลายเดือนก่อน +15

      In the Huffman algorithm, the codes used are designed as such that no shorter code is inside a longer code.
      So in the example you've provided, 1011 would mean you have the symbol "101" and then you have another symbol starting with "1". The way huffman is usually taught in school, this involves following a tree, where 1 is right and 0 is left, and the decoder traverses that tree with each bit you read until you eventually hit a symbol, output that symbol, and then go back to the top of the tree.
      with that explanation, the symbol encoded for 101 would mean you went right, left, right in the tree, hit a symbol, and went back to the top.
      Hopefully this explanation is easy to understand, it would be better with diagrams but those are hard to make in youtube comments.

    • @tandemdwarf745
      @tandemdwarf745 3 หลายเดือนก่อน +2

      @@inanefool8781 You can pause the video at 1:48 to see this effect; all the shorter codes are not contained within the longer ones.

    • @TreeLuvBurdpu
      @TreeLuvBurdpu 3 หลายเดือนก่อน +3

      @@inanefool8781 that is actually a very good explanation that is ready to understand. Thanks. Hopefully i can remember it this time.

    • @antoine2571
      @antoine2571 3 หลายเดือนก่อน +2

      By the way a coding algorithm where this property holds (no word is prefix of another one) is called a prefix code.
      Iirc from reading CLRS, a non-prefix code can always be transformed into a prefix-code, so usually textbooks focus on prefix-code.
      Though I have very few knowledge about codes so I might be wrong.

  • @Kynatosh
    @Kynatosh 3 หลายเดือนก่อน +2

    Before 11:11, I was like great, this is cool but doesn't seem that helpful
    After that simple change, everything makes so much sense

  • @steveunderwood3683
    @steveunderwood3683 3 หลายเดือนก่อน

    Arithmetic coding is widely used. It was the only option for some standards, even before the patents expired. So, it got used, patents or not. I've had to implement it for things like JBIG coding for FAX, where its specifically there to improve on the huffman coding in the previous compression methods used for FAX.

  • @scj643
    @scj643 2 หลายเดือนก่อน

    AMR-WB+ is the same place for audio encoding/decoding. It's a very specific codec used in talking books that had its patent expire in 2020

  • @galewallblanco8184
    @galewallblanco8184 3 หลายเดือนก่อน +1

    "this is one of the way your jpegs [...] compress data without losing information"
    jpeg: lossless compression
    jpeg does lose information, but it was optimized to not lose too much *meaningful* perceptible information (that is until you zoom in)

    • @beardymcbeardface69
      @beardymcbeardface69 2 หลายเดือนก่อน

      JPEG uses both lossy and lossless compression. The lossy compression first, then lossless compression of that as a last step. Obviously with the final result being lossy.
      The lossless last step is a variant of Huffman.
      So to say that JPEG compresses data without losing information is a half truth. Badly worded on his part, I'd say, without clarifying that it uses both and the final outcome is indeed lossy.

  • @Mega-wt9do
    @Mega-wt9do 3 หลายเดือนก่อน +6

    This channel is a hidden gem :O

  • @uis246
    @uis246 3 หลายเดือนก่อน +5

    You made Jarek happy. He looks pleased in JXL discord server.

  • @rivest-oss
    @rivest-oss 17 วันที่ผ่านมา

    THIS VIDEO IS SOOO GOOOOD!

  • @BGBTech
    @BGBTech 2 หลายเดือนก่อน

    A downside with a lot of these is that they are slow: Arithmetic coding = Very slow, Bitwise Range Coding = Rather slow (but faster than traditional arithmetic coding), LZMA / 7zip uses this. Huffman can be made semi-fast (and a little faster if one limits symbols to 12 or 13 bits vs 15 or 16; with a lower limit the decoder's lookup table can fit into the L1 cache).
    Rice coding can also be made fast (more so when a similar length-limited is applied), but only works well with particular probability distributions; but can be turned into an adaptive scheme with relatively little added cost. One trick I have used is to combine Adaptive Rice with a "swap towards front" table (each symbol is encoded as an index into a table, and when encoded, the symbol at index J is swapped with the symbol at index (J*15)/16, which is used the next time this symbol is encoded). This scheme is adaptive, can be almost as fast as Huffman, and almost as good (in terms of compression) while needing less memory footprint (sometimes useful) and less code (also sometimes useful), and (unlike Huffman) the decoder lookup tables can be made read-only. But, a lot depends on what one needs...

  • @pistachos4868
    @pistachos4868 3 หลายเดือนก่อน +7

    1:12 azumanga daioh reference😺

    • @JesusPlsSaveMe
      @JesusPlsSaveMe 3 หลายเดือนก่อน +2

      Revelation 3:20
      Behold, I stand at the door, and knock: if any man hear my voice, and open the door, I will come in to him, and will sup with him, and he with me.
      HEY THERE 🤗 JESUS IS CALLING YOU TODAY. Turn away from your sins, confess, forsake them and live the victorious life. God bless.
      Revelation 22:12-14
      And, behold, I come quickly; and my reward is with me, to give every man according as his work shall be.
      I am Alpha and Omega, the beginning and the end, the first and the last.
      Blessed are they that do his commandments, that they may have right to the tree of life, and may enter in through the gates into the city.

    • @MadocComadrin
      @MadocComadrin 3 หลายเดือนก่อน +1

      Oh my Gah!

  • @simonstrandgaard5503
    @simonstrandgaard5503 3 หลายเดือนก่อน

    Great explanation

  • @liliangimenez4461
    @liliangimenez4461 3 หลายเดือนก่อน

    Thank you this helps me to understand arithmetic encoding and ANS.
    Both (+b Huffmann) assume the symbols are independant. Could Markov chains improve compression rates, or techniques based on repeated occurrences makes Markov chains irrelevant ?

  • @sapiosuicide1552
    @sapiosuicide1552 3 หลายเดือนก่อน

    This is the plot of HBO's Silicon Valley. The main protagonist Richard Hendricks creates a more efficient method of storing data and they talk about Huffman codes. Everything in this video went over my head but you should watch that show.

  • @flameofthephoenix8395
    @flameofthephoenix8395 3 หลายเดือนก่อน

    0:28 While of course it is stored as a number, any piece of data on a computer is a series of 0 or 1 bits, there are not themselves numbers, but any series of them can also be represented by a number.

  • @blacky7801
    @blacky7801 3 หลายเดือนก่อน +2

    Just do a binary search on the list of all words? brilliant!

  • @NorthWay_no
    @NorthWay_no 3 หลายเดือนก่อน

    Some day I will grok how you output a Huffman file with the weights included, or possibly how they are inferred from an adaptive version - and not taking up masses of space, relatively, for small files.

  • @rogercruz1547
    @rogercruz1547 หลายเดือนก่อน

    Intellectual property is the attempt to own somebody else's capacity to reproduce work. Ownership over brains or body of somebody else is abolished and so should patents and the whole concept of intellectual property as well be.

  • @SiMeGamer
    @SiMeGamer 26 วันที่ผ่านมา

    I think it would be nice if in future you put what people need to know beforehand.
    This was monumentally complex with how much terminology was thrown at the viewer as well as incredibly fast equation transformations.
    There was pretty much zero edge case examples (the ones you use to establish how it works and build on top of).
    I find this video pedagogically horrendous for an average viewer and there was no attempt whatsoever to dissuade anyone and inform them that it was not for them.
    Many other comments in the comment section here reflect that - the people who choose to watch this are likely a little familiar with computer science and math to what I'd argue high school degree and maybe a little more. I would expect the level of explanation to be as such. Huffman encoding is fairly straight forward to explain. This felt like you'd needs an entire degree just to grasp it when in actuality it's not that complex when you go and experiment to try and understand it on your own.
    The entire section in the middle that explains information density was, for me, impossible to understand. Part of it is the excessive use of formulas and their transformations rather than raw data examples. This felt like a thesis paper and not an educational video.
    Pedagogy is a skill. And the best way to improve at it is to practice and test. Did you even show this to anyone who isn't as knowledgeable on the subject matter? They would point out dozens of points where they got completely lost. That is where you ask questions and figure out a way to explain the concept you are talking about in more clear terms. Maybe it's he words being used. Maybe it's the pacing. Maybe it's the visuals. Maybe it's all of the above. Or maybe it's something else entirely. But you will never have the intuition until you develop it. And to me, this video shows a very deep lack of said intuition.
    I love teaching. I love learning. I think you have the skills and knowledge to remake this into something amazing. But as it is now, it's extremely lackluster.
    This almost makes me want to learn these algorithms deeper just so I could put out a better video on the subject since this, to my instinct, turns people away from using it rather the encouraging.
    I'll personally take more time to learn this since I find lossless data compression potentially very valuable in what I do for a living.
    I'm happy you showcased these even exist.
    And I hope this comment encourages you to make more stuff in the future rather than stop completely. This is made as constructive criticism. Take your time and show someone your work before you publish it. Preferably a few people with different levels of understanding so you know how approachable it is and which group to cater to. The production was very good.
    So good luck and I hope you inspired more people to pursue these types of algorithms.

  • @lucavogels
    @lucavogels หลายเดือนก่อน

    Es gibt nur eine Person die ins solch ein Straflager lebenslänglich reingesteckt gehört …

  • @picksalot1
    @picksalot1 3 หลายเดือนก่อน +2

    The music is pretty, but distracting. Imagine trying to look at and analyze a Painting while large, pretty birds fly between it and your eyes.
    If you're you feel it's necessary to have music, I suggest you lower its volume substantially.

  • @kebman
    @kebman 2 หลายเดือนก่อน

    Yes, I've implemented Huffman coding for an exam. Yay.... But it is interesting tho :D

  • @lucasa8710
    @lucasa8710 2 หลายเดือนก่อน

    this video was very difficult to digest, I don't have a math background, but I also think that the video should focus more on what something means rather then what it is

  • @user-qr4jf4tv2x
    @user-qr4jf4tv2x 3 หลายเดือนก่อน +1

    patent kills innovation

  • @RikAllen-b1c
    @RikAllen-b1c 2 หลายเดือนก่อน

    Erm, pretty sure I’m watching the video encoded with arithmetic coding. It’s the basis of h.265 and h.264 video codecs, s as long with their still image profiles.
    H.264 baseline profile is closer to Huffman, but the main profiles expect to use context adaptive binary arithmetic coding.

  • @2false637
    @2false637 3 หลายเดือนก่อน +1

    I still don’t quite understand why in the 10-symbol alphabet Huffman encoding does worse than the binary conversion (which outperforms the entropy?). Is it because, by treating the sequence as an entire number, we extract some hidden correlation/information? This is really fascinating.

    • @JentacularGent
      @JentacularGent  3 หลายเดือนก่อน +3

      entropy is a lower bound for _average_ code length, but there can (and should) still be messages with shorter codes. In particular, this message lent itself well for the binary conversion algorithm because its first digit was a 1 (i.e. it was small). Huffman performs worse in general because the entropy of a symbol is log2(10) ≈ 3.22 bits per symbol, which isn't a whole number

  • @i-am-linja
    @i-am-linja 3 หลายเดือนก่อน +1

    If I had skill, I'd release an application infringing as many obviously frivolous patents as possible, then start a GoFundMe for the legal fees. Set a damn precedent.

    • @98danielray
      @98danielray 3 หลายเดือนก่อน

      how brave

  • @ConorOHagan-l3f
    @ConorOHagan-l3f 3 หลายเดือนก่อน

    Sorry man I really wanted to keep up but I just can't. I'm sure this was a fantastic video for people smarter on the subject than myself :)

  • @jan-pi-ala-suli
    @jan-pi-ala-suli หลายเดือนก่อน

    “i reject your rejection”

  • @juancarlospizarromendez3954
    @juancarlospizarromendez3954 3 หลายเดือนก่อน

    I suggest it for the research, to use nibbles 0..9A-F instead of decimal digits or binary digits (bytes as digits are too symbols).

  • @Iamaduck888
    @Iamaduck888 3 หลายเดือนก่อน +3

    I really like this new TH-cam algorithm

  • @DB-Barrelmaker
    @DB-Barrelmaker 2 หลายเดือนก่อน

    I almost understood this but there was a damn plane kept flying over my head

  • @abd_cheese7353
    @abd_cheese7353 3 หลายเดือนก่อน +5

    This channel is boutta blow up

  • @JesseBourretGheysen
    @JesseBourretGheysen 3 หลายเดือนก่อน

    Great video, subscribed, 🎉

  • @Neptoid
    @Neptoid 3 หลายเดือนก่อน +1

    What about AI? It predicts the probability of the next tokens, why not the Hofphman thingy given those priors

    • @yuantan9292
      @yuantan9292 2 หลายเดือนก่อน

      The probability prediction and the encoding method are two separate steps, and the video just focused on the latter and not the former.
      In most of the video, the probability is fixed (or at least known) and independent, so there's no need to predict anything.
      In benchmarks using real world data, AI is indeed used for the probability prediction; for example, the top 23 (highest compression ratio, lowest file size) compressors of the enwik9 dataset all used some sort of AI and the current best (nncp v3.2) used a Transformer model.
      In real life applications though, AI is too slow compared to simple, non-AI predictors, so the smaller file size is hardly worth the extra time and power cost. While nncp v3.2 can compress enwiki9 to 49% of that of zstd (zstd 0.6.0 level 22 ultra), it takes 9.6x as much memory and 683x as much time to compress and then decompress the dataset.

  • @SupaKoopaTroopa64
    @SupaKoopaTroopa64 3 หลายเดือนก่อน

    AV1, the codec I'm watching this video in right now, uses non-binary arithmetic coding. I think the patents only apply to binary arithmetic encoding.

  • @Subbestionix
    @Subbestionix 3 หลายเดือนก่อน

    Holy.... This is dense *pun intended*

  • @Siroitin
    @Siroitin หลายเดือนก่อน

    Huffman also accepts that "01" can be multiple characters like "you"?
    Also H(X) is sometimes called Shannon's measurement of Information (SMI)