these compression algorithms could halve our image file sizes (but we don't use them)

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ธ.ค. 2024

ความคิดเห็น • 592

  • @JentacularGent
    @JentacularGent  4 หลายเดือนก่อน +412

    TO CLARIFY: don't be afraid to use these algorithms! the patents for arithmetic coding are all expired, and the only justifiably patentable material in Microsoft's patent are specific implementations of adaptive-probability rANS (which JPEG XL actually doesn't use). Microsoft has also given anyone permission to use it in open source, royalty-free codecs
    at around 7:20, i put "NH(X) = log_2 X" ... H(X) is the entropy of a symbol, not of the code! X on the left is a symbol, and X on the right is the code ... sorry! let me know of any other corrections. and thank you for watching!!
    I saw some mixed reactions to the music. I made a poll about this for future reference, but I don't know how to link it here, so please check it out in my channel

    • @JorgetePanete
      @JorgetePanete 4 หลายเดือนก่อน +5

      Please reupload it fixed.

    • @gardian06_85
      @gardian06_85 4 หลายเดือนก่อน +4

      for the purpose of encoding files even the full ASCII set wouldn't either of these methods fall down because instead of being base 10 we are base 255 converted into base 2. granted the the opcode values would have very low occurrence for a probabilistic approach (unless the files contains assembly) then how do these algorithms deal with character sets like UTF16 or UTF32? wouldn't that imply that even the arithmetic approach would need a given char value to be 32 bits at which point we are in almost non compressed state?

    • @wirelyre
      @wirelyre 4 หลายเดือนก่อน +7

      ​@@gardian06_85
      UTF-8, UTF-16, and UTF-32 are all encodings for the same character set. So you could compress a UTF-32 file by encoding it as UTF-8 and then using one of these methods.
      You could also use 64-bit integers, or even greater bases.
      But most compression formats don't encode bytes directly. They encode symbols that are instructions to the decoder, like a little machine. "Copy the next three bytes to the output stream." "Look back 40 bytes in the output and copy 25 again."

    • @mehmetdemir-lf2vm
      @mehmetdemir-lf2vm 4 หลายเดือนก่อน +4

      please remove background music because it distracts. we can always listen to music while watching videos, but we cannot remove them from videos.

    • @meneldal
      @meneldal 4 หลายเดือนก่อน

      You can also get the code from the h264 or hevc reference encoder (which both should only use expired patents by now) for how they handle the variable probabilities and how they implement it. Basically there's a bunch of bins that store probabilities and they have parameters for a starting probability and how much it should be updated each time.

  • @leeroyjenkins0
    @leeroyjenkins0 4 หลายเดือนก่อน +1734

    0:47 intellectual property law is a completely fair and balanced system with no exploits whatsoever

    • @KillFrenzy96
      @KillFrenzy96 4 หลายเดือนก่อน +384

      Why the hell can you patent something from the public domain? It should never be enforceable, because anyone can claim they obtained it from the original public domain source itself.

    • @ryansullivan3085
      @ryansullivan3085 4 หลายเดือนก่อน +121

      One of the most frustrating things I've ever seen

    • @Gelatinocyte2
      @Gelatinocyte2 4 หลายเดือนก่อน +76

      Spiffing Brit reference detected

    • @JarheadCrayonEater
      @JarheadCrayonEater 4 หลายเดือนก่อน +8

      ​@@Gelatinocyte2someone gets it

    • @viliml2763
      @viliml2763 4 หลายเดือนก่อน +111

      @@KillFrenzy96 They didn't patent it directly, they patented some straightforward expansions and optimizations that virtually everyone would need to use in practice.

  • @timseguine2
    @timseguine2 4 หลายเดือนก่อน +658

    Patents suffer from the problem of trivial extension. The typical thing I have noticed is that you can take a well known thing, not change it at all, and use it in a context it hasn't really been used before. Then patent that.
    I can't patent Huffman Coding? What about Huffman Coding... in space?

    • @TreeLuvBurdpu
      @TreeLuvBurdpu 4 หลายเดือนก่อน +8

      You can't patent things that are in common usage. You have to be the inventor.

    • @anonymousalexander6005
      @anonymousalexander6005 4 หลายเดือนก่อน +122

      @@TreeLuvBurdpunope, as long as you can prove it’s a novel idea, there’s a possibility of getting a patent. It all depends on how good your lawyer is…

    • @timseguine2
      @timseguine2 4 หลายเดือนก่อน +99

      @@TreeLuvBurdpu tell that to the patent officer that approved some of the patents I have been directly or indirectly involved in.
      In theory, completely agree with you. In my experience that is not the case in practice, and the concrete application is more important.
      I haven't read the actual patent myself, but I found out one of the teams I worked with successfully got a patent for using CNNs for image classification in a certain medical context. They didn't invent using CNNs for image classification.

    • @thisisabsolutelystup
      @thisisabsolutelystup 4 หลายเดือนก่อน +34

      Part of the problem is that the standards of whether something can be patented or not is completely variable. In theory it has to be original and demonstrate an inventive step - something not obvious, something the inventor thought of or did.
      So Microsoft taking Huffman encoding and simply applying it to a few common critical structures and patenting that should not pass the inventive step threshold, because it's an obvious application that anyone using Huffman encoding would do.
      Patent law, especially in the software space, needs real reform somehow.

    • @timseguine2
      @timseguine2 4 หลายเดือนก่อน +12

      @@thisisabsolutelystup The huffman encoding in space example wasn't real. I was referencing the "recycled in space" trope. Maybe I am too chronically online.

  • @purplenanite
    @purplenanite 4 หลายเดือนก่อน +746

    If there were enough algorithms to do this with, I could imagine making a series of "Algorithms you can't use"

    • @au-lit
      @au-lit 4 หลายเดือนก่อน +47

      There totally is a lot of them in this situation, though usually there is good alternatives.

    • @Xankill3r
      @Xankill3r 4 หลายเดือนก่อน

      There is a whole book of these and even that just barely scratches the surface. Look up Math You Can't Use by Ben Klemens. It's over 2 decades old now so I'm sure we could easily write several new editions of it with newer patent BS like this one.

    • @beeverfeever4930
      @beeverfeever4930 4 หลายเดือนก่อน +6

      If I had to guess, there most likely would be plenty of algorithms you can't use

    • @arossfelder
      @arossfelder 4 หลายเดือนก่อน +7

      I would not recommend watching this video, it's very dangerous for a developer because it makes you liable if you end up using one of them

    • @DwAboutItManFr
      @DwAboutItManFr 4 หลายเดือนก่อน

      You can use in FOSS projects no?

  • @jeanf6295
    @jeanf6295 4 หลายเดือนก่อน +76

    This a neat video, and an interesting subject, however, it was a bit too fast paced for me. A few could be done differently :
    - Sometimes, you list goals, but do not put those in written format, which makes things harder to follow.
    - Having a synthetic representation of the algorithm as a whole could also help a lot, seeing the states of the variables evolve in real time is neat, but insufficient.
    - Getting the meaning of abstract notations takes some time, using a more explicit naming convention could help.

    • @raffimolero64
      @raffimolero64 3 หลายเดือนก่อน +9

      I agree. I couldn't follow along as well as I'd have liked. There were so many symbols with single letter names and mentions of technical terms without accompanying visualizations. In an ideal world where the editor never gets tired and never has to take breaks, every variable in the equation must constantly have its definition somewhere on-screen. either as text or as an image explaining what it is.
      But I do really appreciate the fact that the video exists at all. It's a lot more digestible than... papers.

  • @safebox36
    @safebox36 4 หลายเดือนก่อน +278

    It's actually being used in JPEG XL.
    The problem is so few programs support it despite its backing from every major web browser, photography company, open source organisation, and cybersecurity group.
    Cuts the size of PNGs in half in most cases while still retaining lossless quality.

    • @keisisqrl
      @keisisqrl 4 หลายเดือนก่อน +98

      Google pulled JPEG XL from Chromium in favor of their own webp which effectively killed it as a web format, Mozilla and WebKit (Safari) followed suit.

    • @JohnDoe-jk3vv
      @JohnDoe-jk3vv 4 หลายเดือนก่อน +8

      JPEG XL is not lossless

    • @darthjaffacake8573
      @darthjaffacake8573 4 หลายเดือนก่อน +63

      It can be, just depends on settings

    • @JohnDoe-jk3vv
      @JohnDoe-jk3vv 4 หลายเดือนก่อน

      @@darthjaffacake8573 I was wrong

    • @connor_santoyo
      @connor_santoyo 4 หลายเดือนก่อน

      ​​@@keisisqrl Google killed JPEG XL in favor of AVIF, not WebP. WebP is no longer being developed. Mozilla did not kill off JXL either, it is still available as a nightly flag. And Apple has full support for JXL across all their OSs including WebKit.

  • @Manabender
    @Manabender 4 หลายเดือนก่อน +75

    6:13 Took me a second to figure out how clever this was.
    Assuming you're playing a game of hangman in which the word to be guessed is an actual, English word, it could be plane, plant, plank, or plans. In that case, saying that the last letter is a vowel is worth two bits of information (as it divides the space of valid possibilities by 2^ *2* ). Saying that the last letter is a letter is worth zero bits, given assumptions.

    • @0LoneTech
      @0LoneTech 2 หลายเดือนก่อน +3

      There's actually an input method based on this, called Dasher. It's literally harder to misspell words in.

  • @LuveelVoom
    @LuveelVoom 4 หลายเดือนก่อน +146

    jarek duda: check out this really cool compression algorithm I made! it’s free and unpatented and twice as good as Huffman coding.
    Microsoft: check out this really cool compression algorithm I totally didn’t steal from jarek duda! it’s proprietary and patented and twice as good as Huffman coding.
    patent office: yeah, we don’t have it on file. Here’s your patent, Microsoft
    jarek duda:

    • @jb31842
      @jb31842 4 หลายเดือนก่อน +27

      The "I made this" meme would have encoded the same information but vastly shorter

    • @LuveelVoom
      @LuveelVoom 4 หลายเดือนก่อน +41

      @@jb31842 Unfortunately it was patented by Microsoft.

  • @anon_y_mousse
    @anon_y_mousse 4 หลายเดือนก่อน +42

    I'm of the opinion that software shouldn't be patentable, but also that copyright should expire after no more than 28 years, although 14 would be preferable. Years ago, back when I was in middle school, I remember playing with a compression program written in QBasic which used arithmetic coding. I thought it was pretty cool, but at the time didn't know how to use it. That was significantly longer than 17 years ago now, and even if the idea were capable of being patented, it definitely should no longer be as 17 years is the limit for a patent. Of course, I also think that our patent system needs to be fixed in more ways than one, including that people with technical knowledge should be the ones to review patents in technical fields and the law, which gives us the criteria that they should be novel, should actually be applied and prevent patent squatting as MS, and many other corporations acting in bad faith, have done.

    • @peceed
      @peceed หลายเดือนก่อน +1

      No, we need just remove patent laws. There will be inventinve AI that can solve "on flight". There is no reason to give right for ideas to the first organization that will run the search.

  • @denislamarche224
    @denislamarche224 4 หลายเดือนก่อน +44

    I can't tell you how far above my head this video is....but it approaches infinity.

    • @aceman0000099
      @aceman0000099 4 หลายเดือนก่อน +5

      You could probably understand it if it was something physical like sorting books and you did it as a job for about an hour straight

  • @Anon-do7lo
    @Anon-do7lo 4 หลายเดือนก่อน +27

    My data compression professor last semester made really good points about how Huffman still has a place. It’s just a substitution thing and you can simply read the encoded text if you have the scheme. It’s a lot more performant when it comes to decoding.

    • @jursamaj
      @jursamaj 4 หลายเดือนก่อน +2

      Yes, the best compression schemes have multiple algorithms, and choose the appropriate one. Nobody is saying to just throw Huffman away.

    • @mage3690
      @mage3690 2 หลายเดือนก่อน

      I love asymmetric compression schemes personally, but the wiki claims that fractal compression has largely been abandoned because it became clear that fractal compression would always be very asymmetric.

  • @wikiPika
    @wikiPika 4 หลายเดือนก่อน +201

    1:25 "hello, how are you? fine, thank you"

    • @gamerdomain6618
      @gamerdomain6618 4 หลายเดือนก่อน +53

      oh my gaah!

    • @puppergump4117
      @puppergump4117 4 หลายเดือนก่อน +2

      I know that from a meme someone sent and now wonder how relevant it is to the world

    • @froxcey
      @froxcey 4 หลายเดือนก่อน +28

      Hello everynyan

    • @4.0.4
      @4.0.4 4 หลายเดือนก่อน +15

      ​@@puppergump4117the anime was surprisingly influential in that it also coined the word "waifu" from one teacher saying "mai waifu".

    • @xhbirohx2214
      @xhbirohx2214 4 หลายเดือนก่อน +1

      im out of the loop, can someone fill me in?

  • @kristianTV1974
    @kristianTV1974 4 หลายเดือนก่อน +458

    Typical fucking Microsoft, trying to patent something in the public domain.
    God how I hate that company.

    • @crix_h3eadshotgg992
      @crix_h3eadshotgg992 4 หลายเดือนก่อน

      @@koghslook up the Microsoft hate website. All tech corporations are dog shit evil.

    • @serkandevel7828
      @serkandevel7828 4 หลายเดือนก่อน +49

      ​@@koghs Microsoft is notorious for being patent-trolls towards FOSS

    • @PefectPiePlace2
      @PefectPiePlace2 4 หลายเดือนก่อน +82

      the problem is copyright/trademark/IP law, Microsoft is a symptom
      you should blame the root cause if you want the problem to be fixed

    • @4.0.4
      @4.0.4 4 หลายเดือนก่อน +14

      It's usually said of journalists but, "you don't hate Microsoft enough" 😂

    • @AlexGeek
      @AlexGeek 4 หลายเดือนก่อน +5

      They used to do that a lot before. But now they try to compete more with online services.

  • @Chalkadoo
    @Chalkadoo 4 หลายเดือนก่อน +138

    So what I got from this video is that we need patent law reforms and also Microsoft needs to get its shit kicked in REALLY hard?

    • @DiThi
      @DiThi 4 หลายเดือนก่อน +44

      Better: software patent abolition. We don't have them in Europe and we don't want them.

    • @i-am-linja
      @i-am-linja 4 หลายเดือนก่อน +1

      @@DiThi Nice, but is there a way to _prevent_ an American company from patenting your idea?

    • @i-am-linja
      @i-am-linja 4 หลายเดือนก่อน +1

      @@DiThi Or failing that, to prevent said patent being enforced in your jurisdiction?

    • @DiThi
      @DiThi 4 หลายเดือนก่อน +10

      @@i-am-linja I thought prior art would be enough, and maybe it's still enough and the patent for ANS wouldn't really hold in court, who knows. Patents are not enforced in my jurisdiction, they can't. However we live in a global world and using US patented algorithms would prevent me from publishing my software in the US.

    • @stdcall
      @stdcall 2 หลายเดือนก่อน

      hom osuck

  • @arduano
    @arduano 4 หลายเดือนก่อน +11

    Thank you so much for sharing this! I was implementing LZMA earlier this year (for fun, and to learn how it works), and when I got to LZMA's range encoding component I realized how genius this approach is. It took me days to understand, but your video would've helped so much if I was around haha

  • @Xankill3r
    @Xankill3r 4 หลายเดือนก่อน +270

    We need another edition of Math You Can't Use by Ben Clemens just to cover this BS. Well, I mean we probably need a reconsideration of the patent regime first. Software - being just maths - should *not* be granted patents at all. If the whole idea behind patents was to foster innovation then "we've been sending completely unnecessary amounts of data over the internet because patent trolls" is definitely evidence that it doesn't work that way.

    • @mytech6779
      @mytech6779 4 หลายเดือนก่อน +7

      Most of the time software cannot be simply patented, however source code is covered by copywrite. They are not the same thing. There are a few exceptions but you need to dig into the specific cases to have any meaningful discussion of that area.

    • @Xankill3r
      @Xankill3r 4 หลายเดือนก่อน +23

      @@mytech6779 copyright doesn't matter here since we are talking about patents. Copyright also only applies to the specific implementation - or expression - of an idea. Therefore although there are problems with copyright as well it doesn't at least give someone a complete right over an idea.

    • @wb3904
      @wb3904 4 หลายเดือนก่อน +2

      Maths and algorithms are fundamentally different. Computers have more in common with logic than maths.

    • @Turalcar
      @Turalcar 4 หลายเดือนก่อน +15

      @@wb3904 logic is maths

    • @wb3904
      @wb3904 4 หลายเดือนก่อน +3

      @@Turalcar no maths is based on logic not the other way around

  • @theparadox42
    @theparadox42 หลายเดือนก่อน +1

    I remember you way back on the days of KA. Awesome to see what you're making now. Keep it up!!

  • @DumToasty
    @DumToasty 4 หลายเดือนก่อน +19

    1:23 Azumanga Daioh reference 「I wish I were a bird.」

  • @dl5244
    @dl5244 3 หลายเดือนก่อน +8

    I remember seeing a demo for fractal encoding instead of JPG in the early 1990's in the days where 14.4kbps modems were still very expensive so I only had 2400 baud dialup. The demo was a real-estate website with pictures of a houses. Interlaced jpg images took a few minutes to download and view, where as the new fractal format was

    • @Tumbolisu
      @Tumbolisu 3 หลายเดือนก่อน +3

      Even with today's internet speeds, I sometimes end up having to deal with ridiculously slow connections, ranging from minutes to hours. Genuinely unable to look at pictures sent via a specific service because it loads 10 rows of pixels per second. Compression like that should be a priority for all online services, and there is no excuse for CPU load being high, considering how stupidly fast they are now.
      Oh well, how else are they gonna get us to pay more for "faster" internet?

  • @dumdum7099
    @dumdum7099 4 หลายเดือนก่อน +65

    This is way above my paygrade. I need more readings before I watch this video.

  • @sniper9143
    @sniper9143 4 หลายเดือนก่อน +7

    I recently used an adaptive Arithmetic encoder and a LSTM for a custom compression algo, the arithmetic encoder was cool to work with

  • @mind_11
    @mind_11 4 หลายเดือนก่อน +2

    I appreciate the many funny example words you used. Very epic video! Awesome visuals and interesting explanation!

    • @simonwillover4175
      @simonwillover4175 4 หลายเดือนก่อน +1

      It was a pretty serious video. None of the examples are funny.

    • @Martin-km4gb
      @Martin-km4gb 4 หลายเดือนก่อน

      ayaya

  • @ADRIANO782
    @ADRIANO782 2 หลายเดือนก่อน +3

    To be honest this should've been an hour long video, it really went over my head

  • @jb2170-mathmo
    @jb2170-mathmo 4 หลายเดือนก่อน +2

    This video is like the perfect counterpart to my #SoME entry :D
    Huffman Trees are the one part I left out of my vid, this is awesome

  • @koka3243
    @koka3243 4 หลายเดือนก่อน +1

    Entropy coding explained in 5 mins. Must be a new record 🎉

  • @randomlegodev
    @randomlegodev 4 หลายเดือนก่อน +20

    love me a good encoding algorithm breakdown 🔥
    animations lookin snazzier

  • @el_quba
    @el_quba 4 หลายเดือนก่อน +99

    This is an excellent explanation no doubt, but I must be honest I find it very hard to follow and understand

    • @kitamashi
      @kitamashi 4 หลายเดือนก่อน +39

      It is only a good explanation if you are familiar with the jargon, in other words, it is a bad explanation.

    • @jmsether
      @jmsether 4 หลายเดือนก่อน +15

      It's basically an academic paper in video form. I found the information to be well laid out and easy to follow.

    • @atiedebee1020
      @atiedebee1020 4 หลายเดือนก่อน +13

      If its hard to follow and understand, its not a good explanation

  • @kuroclown
    @kuroclown 2 หลายเดือนก่อน

    3 minutes into the video and i knew.....my brain is smooth, very smooth

  • @Flourish38
    @Flourish38 4 หลายเดือนก่อน +61

    Saw some comments complaining about the music, and just wanted to say that I disagree completely. Having some music is nice, and the choices you made are very low energy and out of the way anyways.

    • @yuriiknapik7241
      @yuriiknapik7241 4 หลายเดือนก่อน +4

      Yeah i can see your point.Maybe he could sidechain the music so when he speaks the music dives in volume more.

    • @gnt10
      @gnt10 2 หลายเดือนก่อน

      I think its fine as well, though it would be nice to have an option on yourube to turn off the bg music if you have your own prefered soundtrack, though the video would need to made withouth merging the soundtracks making the file a bit bigger😅
      I dont think that should be a huge isue🤔

  • @leeroyjenkins0
    @leeroyjenkins0 4 หลายเดือนก่อน +39

    Slight gripe, at 9:00 it was a bit confusing that you described the operations as "outputting a bit representing which half we're in" and "centering and renormalizing" instead of just saying we want to find a number within the range, that halving is just how decimal binary works, and briefly pointing out why the "centering and adding 0s" works.
    Otherwise interesting video, and cool subject, thanks!

    • @MADSK_LLZ
      @MADSK_LLZ 4 หลายเดือนก่อน +6

      Yeah, he lost me at that part, too.

    • @k1ll3trs86
      @k1ll3trs86 4 หลายเดือนก่อน +2

      I can see the intention of the video
      But what he doesn't realize is that thinking in whole probability is not functional, that's not how machine works and it's ultimately extremely inefficiency.
      It's a classic "looks good on paper" but in practice you are trying to force something that doesn't fit

    • @Kynatosh
      @Kynatosh 4 หลายเดือนก่อน +1

      Took me some time wonderi'g what he was getting at but it works I guess
      I understood at the end of his explanation that it was just writing the number in base 2

    • @anonymous-q2b5s
      @anonymous-q2b5s 4 หลายเดือนก่อน +1

      ​@@k1ll3trs86I don't understand what you're trying to say. This is not only good on paper but good in practice too. It is literally strictly better than huffman and huffman is used everywhere.

    • @ZeroPlayerGame
      @ZeroPlayerGame 4 หลายเดือนก่อน +1

      @@anonymous-q2b5s I think what's meant is that keeping the whole real number in memory and then writing it out in binary is a bad idea (because doing math on long real numbers is a bad idea).

  • @Mnnvint
    @Mnnvint 3 หลายเดือนก่อน

    This was a heroic effort at an explanation video, but it should really have been a 10 video series at least.

  • @Psychx_
    @Psychx_ 4 หลายเดือนก่อน +10

    There is a FUSE file system called PiFS. It can store data using some sort of arithmetic coding where the resulting number itself gets compressed as an offset + length into the unlimited digits of Pi.

    • @vurpo7080
      @vurpo7080 4 หลายเดือนก่อน +4

      Relying on the assumption that pi is a normal number... which is probably true but not proven!

    • @BridgeBum
      @BridgeBum 4 หลายเดือนก่อน

      ​@@vurpo7080That would be one hell of a bug to run into some day.

    • @ThylineTheGay
      @ThylineTheGay 4 หลายเดือนก่อน

      @@vurpo7080 even if it's not, you can just split the data you want to add, might even act as a kind of data de-duplication

    • @ThylineTheGay
      @ThylineTheGay 4 หลายเดือนก่อน

      i would genuinely like to see someone make a semi-serious version of that

  • @Shirani007
    @Shirani007 3 หลายเดือนก่อน

    What a wonderful presentation and justification of ANS you have posted. I loved watching it, although speech is a bit faster, I used .75 speed 😊. I would reccomend you create more videos explaining other compression algorithms too, which in fact have arithmetic roots. Highly recommend and highly appreciated dear !
    Onw video I would want can be on different methods of approximation, and second on almost all, mostly forgotten bitcodes generation methods in 1 go. Also error correction, hamming codes, reed Soliman codes too , etc.
    Looking forward to more content ❤

  • @neogen23
    @neogen23 3 หลายเดือนก่อน

    I stumbled upon ANS for the first time in this video, so thanks, but I didn't understanding it at all. I had to watch Stanford's EE 274 on the subject, where it seemed ... approachable, at least. Not that a video of this length was going to fully explain such fairly advanced topics, the pace is quite fast, but it was an excellent introduction to the subject, and it motivated me for further research, so goal achieved I guess. Also the effort in making it really shows.

  • @LovelyBozo
    @LovelyBozo 4 หลายเดือนก่อน +10

    stuff like this is why I'm glad I took AP stats so I can come back and understand what I'm listening to later on this year

    • @tandemdwarf745
      @tandemdwarf745 4 หลายเดือนก่อน +4

      I got a 5 on AP stats and I'm pretty lost lol. This video is less stats heavy, at least the stuff you learn in AP, and more computer science jargon heavy. It also just moves uncomfortably fast for the stuff I have some handle on.

  • @JTCF
    @JTCF 4 หลายเดือนก่อน +11

    I am usually not in the "wow this is so cool but I didn't understand anything" group, but I think I might be with this vid. The video is paced pretty fast, if I wanted I would probably rewatch it. Currently I just feel like falling into a rabbit hole (which happens to me quite a bit) but not keeping up with the falling.

  • @05degrees
    @05degrees 4 หลายเดือนก่อน +10

    Haven’t watched yet but there’s a neat thing: arithmetic coding, or a closely related thing, allows one to convert a sequence of fair N-sided coin outcomes to any other discrete uniform distribution without rejection, though if you don’t want to reject you’ll need unbounded memory, and if you want to flush time to time, you’re effectively rejecting some information you’re getting from the initial sequence. But at least you can fine-tune how much you want to reject/save and change it over time!
    Maybe it’s already in the video but if it’s not, here. Also it’s as if there’s some duality about entropy generated by rejection, non-uniformness introduced if we decide to reject only a given number of times and then be complacent, working memory etc.. I don’t know in what kind of inequality this can be written but I hope information/coding theorists have it already discovered and generalized similarly to what happened to the uncertainty principle (in QM and Fourier analysis-now there exists a general operator algebra approach IIRC).

  • @MadocComadrin
    @MadocComadrin 4 หลายเดือนก่อน +1

    For the folks interested in functional programming or calculating correct programs, Richard Bird's "Pearls of Functional Algorithm Design" has two chapters on arithmetic coding, one on the rational number variant and one on the integer variant.

  • @steveunderwood3683
    @steveunderwood3683 4 หลายเดือนก่อน +1

    Arithmetic coding is widely used. It was the only option for some standards, even before the patents expired. So, it got used, patents or not. I've had to implement it for things like JBIG coding for FAX, where its specifically there to improve on the huffman coding in the previous compression methods used for FAX.

  • @AhrkFinTey
    @AhrkFinTey 4 หลายเดือนก่อน +1

    Nice to see someone who appreciates the second Arabesque!

  • @jazzerbyte
    @jazzerbyte 4 หลายเดือนก่อน +6

    I wonder why Apple rolled out HEIC instead of just using arithmetic compression, considering it would already be incompatible with standard JPG?

    • @vylbird8014
      @vylbird8014 4 หลายเดือนก่อน +12

      There is a technical and a business reason.
      The technical is that HEIC - like the rival AVIF - goes far beyond just arithmetic compression. It's really an adaptation of techniques developed for video compression, and will wipe the floor with JPEG, even arithmetic-compressed JPEG.
      The business reason is that HEIC, being based on the h265 video codec, is a heavily patented format - and one of the contributors to that patent pool happens to be Apple. Not only do they get to use it at a reduced licencing cost, but whenever another company wants to use HEIC, Apple gets a cut of the royalties. It's clear that JPEG is going to decline, and there are a number of formats competing to replace it - it's in Apple's best interests to make sure that the format they stand to profit from is the one to achieve dominance, and making HEIC the default format for taking photos on iPhones and iPads gives it a huge advantage - every photo-sharing service needs to at least support uploading HEIC, or else Apple users will find their photos won't share.
      There's a rival format, AVIF, which is supposed to be patent-free (you can never be sure) and pushed by a rival consortium, with Google as the major backer. They aren't trying to score any patent money, but that doesn't mean it's entirely altruistic: Their aim is to promote an open and unpatented format in order to avoid having to pay up royalties to the likes of Apple.

  • @ron5948
    @ron5948 2 หลายเดือนก่อน

    How 8:45 do you store all messages and allow easy lookup??? There are a few REALLY one excellent, and many adaptive ones, approach here 8:45

  • @4.0.4
    @4.0.4 4 หลายเดือนก่อน +14

    I'll be honest it was hard to follow, maybe I'm just dumb 😅
    The music was fine imo, but if people complain you can "duck" it (the volume lowers on speech).

    • @jeffreychandler8418
      @jeffreychandler8418 4 หลายเดือนก่อน +7

      no you're not dumb. This video seemed to be made for an audience that already knows compression algorithms and information theory at a competent level.

    • @ArnaldurBjarnason
      @ArnaldurBjarnason 4 หลายเดือนก่อน +5

      @@jeffreychandler8418 Not even, I'm pretty familiar with many of the concepts but he's just running through the ideas with no space to breathe or re-emphasis on the critical aspects.

    • @jeffreychandler8418
      @jeffreychandler8418 4 หลายเดือนก่อน +1

      @@ArnaldurBjarnason That's validating to hear xD I usually ascribe misunderstandings as a personal issue than a communicator issue (unless it's REALLY bad like the Zion NPS instagram or something like that).
      As I think about this video again, honestly a two minute succint summary of existing algos would be great, then trim the fat on the new algos to make those simpler, and allow more breathing space.

  • @thedeemon
    @thedeemon 3 หลายเดือนก่อน

    I actually used both algorithms in my lossless video codec - ScreenPressor. Initially it was range coding (arithmetic coding with integers) but it involved division which isn't very fast. Then later version used rANS which is faster but requires compressing data in reverse order. The end result was pretty great: best in class compression and good speed (when the content is not too complicated, as in typical screen data).

  • @fuzzy-02
    @fuzzy-02 3 หลายเดือนก่อน

    I understood about 25% of the video on my first try!

  • @TreeLuvBurdpu
    @TreeLuvBurdpu 4 หลายเดือนก่อน +5

    I'm stuck on the variable length encoding in your explanation. "Shorter codes encode more frequent symbols". How do you indicate the length of bits that should be decoded into a single symbol? Presumably, short codes would exist inside longer encodings, for instance "101” is inside " 1011”, and "10” is inside ”101”.

    • @inanefool8781
      @inanefool8781 4 หลายเดือนก่อน +15

      In the Huffman algorithm, the codes used are designed as such that no shorter code is inside a longer code.
      So in the example you've provided, 1011 would mean you have the symbol "101" and then you have another symbol starting with "1". The way huffman is usually taught in school, this involves following a tree, where 1 is right and 0 is left, and the decoder traverses that tree with each bit you read until you eventually hit a symbol, output that symbol, and then go back to the top of the tree.
      with that explanation, the symbol encoded for 101 would mean you went right, left, right in the tree, hit a symbol, and went back to the top.
      Hopefully this explanation is easy to understand, it would be better with diagrams but those are hard to make in youtube comments.

    • @tandemdwarf745
      @tandemdwarf745 4 หลายเดือนก่อน +2

      @@inanefool8781 You can pause the video at 1:48 to see this effect; all the shorter codes are not contained within the longer ones.

    • @TreeLuvBurdpu
      @TreeLuvBurdpu 4 หลายเดือนก่อน +3

      @@inanefool8781 that is actually a very good explanation that is ready to understand. Thanks. Hopefully i can remember it this time.

    • @antoine2571
      @antoine2571 4 หลายเดือนก่อน +2

      By the way a coding algorithm where this property holds (no word is prefix of another one) is called a prefix code.
      Iirc from reading CLRS, a non-prefix code can always be transformed into a prefix-code, so usually textbooks focus on prefix-code.
      Though I have very few knowledge about codes so I might be wrong.

  • @fedang
    @fedang 4 หลายเดือนก่อน +1

    Imagine leaving your hard work public for the sake of humanity and greedy corporation takes credit for it and ends up forgetting it in their patent drawer

  • @dadamaldasstuff1816
    @dadamaldasstuff1816 4 หลายเดือนก่อน +45

    You can use the JUSDTB compression (JUSDTB stands for Just Use Smaller Data Types Bro)

    • @dadamaldasstuff1816
      @dadamaldasstuff1816 4 หลายเดือนก่อน +14

      This doesn't actually exist, it's a joke.

    • @wall-e5869
      @wall-e5869 4 หลายเดือนก่อน +6

      this is buhdj
      bro
      ur
      hillarious
      dad
      jokes

    • @dadamaldasstuff1816
      @dadamaldasstuff1816 4 หลายเดือนก่อน +15

      I'm pointing at YOU, database users! You should know the data types and use them accordingly.
      You don't need a string to store one of 3 states, use an enum.
      You don't need an int for a number between 0 and 100, use a byte. It's 4x smaller.
      You should always use unsigned if you don't need negatives. It doubles the positive range.
      Also use decimal, it can store decimal places more precisely than float. If you don't have a horrendously huge range, just use it.

    • @ThylineTheGay
      @ThylineTheGay 4 หลายเดือนก่อน +1

      I prefer πFS personally

    • @Kynatosh
      @Kynatosh 4 หลายเดือนก่อน

      ​​​@@dadamaldasstuff1816wdym? A string can be two bytes of it holds just one char (and the null char appended). An enum is in most languages an integer, which can be 32 or 64 bytes and use up more space. I'm sure the db stores it at one byte though
      I still agree that you should always use the right type though. Even helps you catch bugs early when you get screamed at for having the wrong type

  • @dwighthouse
    @dwighthouse 16 วันที่ผ่านมา

    It would be nice if the video with a title about halving image file sizes actually discussed halving image file sizes. Can you explain why this is specifically good for image compression? Are there any actual implementations of arithmetic encoded images and how do they compare to huffman encoded ones, in terms of filesize? How did you arrive at the 50% improvement value? JPEG-XL, which is the only format you mentioned that kinda used the ideas presented in this video, only has a guaranteed file savings of about 20% when using the JPEG Re-encoding feature on an image that was already a JPEG. How does this raw data stream compression algorithm compare to the image-specific compression algorithms used in various formats?

  • @__MINT_
    @__MINT_ 4 หลายเดือนก่อน

    Quite interesting subject, I didn't treat probability as something related to my interest field, until now. A year ago I started writing my own lossy audio codec, which is done for now, but I didn't implement any kind of entropy coding. I tried Huffman, RLE, but only thing I got was bigger file size and increased complexity. The problem with these algorithms is that there must be a defined set of symbols and a strong pattern in encoded message, otherwise it won't work. Using raw binary sequences as symbols doesn't really work, since there is quite a lot of them and probabilities for each other sorted in decreasing order are far away from shape of 1/x function. But the only pattern I saw were groups of 0's with less 1's in between. I still can't understand how MP3 pre-processes values so that they are suitable for entropy coding. By saying values I mean quantized frequency coefficients after transform from time to frequency domain, MDCT in my case. After psychoacoustic analyzis I know which coefficients can be discarded and what precision to use for the rest of them. Coefficients are divided into subbands with fixed size, let's say 16, and then they are quantized with different precisions. Single frame can have subbands quantized with both 2 bits and 16 bits, although now I know that 16 bits was way too much even for almost lossless compression. Each subband has its scalefactor field and a precison field, saying how many bits are used per coefficient and what scalefactor was used to divide values to quantize them. Quantized values can be basically anything, I didn't see any strong pattern there. From what I heard, MP3 uses more or less fixed precisions for coefficients, and maybe that helps to entropy code them. But in my codec everything is dynamic and bitrate can vary from for example 20 kbps to 400 kbps for the same file and same quality setting, 20 kbps would be used to encode a single tone, while 400 kbps would be needed for a drum hit. Even quantized values are not fixed-length, for example with 3-bit precision, 0 would be just 0 in binary, 1 would be 100, 2 is 101, -1 is 110, -2 is 111. Scalefactors and precison fields also have variable lengths and are differential-coded. There are also magic values for those fields which can specify number of empty subbands, so few tens of empty coefficients can be stored on a few bits. And these aren't all clever mechanisms that I used instead of entropy coding. But I'd like to understand how to implement this coding for audio / how to prepare values for that coding. With that and some other changes I could create even better codec, which already outperforms MP3 a bit. It's called DAC (Dynamic Audio Codec), and I have a video about it. Unfortunately in polish, but today it shouldn't be a problem. I also shared the working codec (browser it all you need to run it), so take a look if you want. Anyways, thanks for the video and I'm waiting for the next one!

    • @__MINT_
      @__MINT_ 4 หลายเดือนก่อน

      Oh, and there is a link to a video about DAC. YT sometimes removes such comments, so I'll share just the video ID: Qy_AinciI9Y

  • @RichardBronosky
    @RichardBronosky 3 หลายเดือนก่อน +1

    11:56 took me [I stopped counting at 6] times listening before I could hear "extra digit".

  • @liliangimenez4461
    @liliangimenez4461 4 หลายเดือนก่อน

    Thank you this helps me to understand arithmetic encoding and ANS.
    Both (+b Huffmann) assume the symbols are independant. Could Markov chains improve compression rates, or techniques based on repeated occurrences makes Markov chains irrelevant ?

  • @rizqiv2
    @rizqiv2 4 หลายเดือนก่อน +48

    i dont understand anything :(

    • @gnt10
      @gnt10 2 หลายเดือนก่อน +1

      Same, i have no idea how to code this at all, or if it even makes sense😔

    • @rizqiv2
      @rizqiv2 2 หลายเดือนก่อน +1

      @@gnt10 i even master on some coding skill but still not understand :D

    • @LzkLdg
      @LzkLdg 2 หลายเดือนก่อน +2

      I found the comment where I belong. 😊

    • @happyatheists9361
      @happyatheists9361 หลายเดือนก่อน

      😁

  • @Cl0udWolf
    @Cl0udWolf 4 หลายเดือนก่อน +1

    Great manim animations. It’s a shame ANS isn’t as widely used. There are similar patent issues with lossy compression algorithms as well, but those are even more complex since there are two vectors of optimization (compression and reconstruction accuracy) where the second does not have a concrete way to measure it.

    • @Aera223
      @Aera223 4 หลายเดือนก่อน

      There are some synthetic benchmarks, like 7zip used to show programs compressed with zip vs 7zip. A standardised benchmark may be useful, though compressors could overfit for that benchmark and look better than they really are

    • @Cl0udWolf
      @Cl0udWolf 4 หลายเดือนก่อน

      @@Aera223 well zip and 7zip are still lossless, but as for lossy a benchmark doesn’t exist for that reason. Some of the reconstruction quality will be subjective. PSNR vs Compression Rate is the most common metric but PSNR can be high while having a very noisy output, or a singular spike of error, as long as it averages out. Some situations lossy compression will have rules for the application such as not deviating more than X amount, or always producing smoothed results

  • @scj643
    @scj643 3 หลายเดือนก่อน

    AMR-WB+ is the same place for audio encoding/decoding. It's a very specific codec used in talking books that had its patent expire in 2020

  • @rivest-oss
    @rivest-oss หลายเดือนก่อน

    THIS VIDEO IS SOOO GOOOOD!

  • @quintopia
    @quintopia 4 หลายเดือนก่อน +3

    When i implemented arithmetic coding a few years back, i had no idea of the fraught patent history. Nice bit of info.

  • @Mega-wt9do
    @Mega-wt9do 4 หลายเดือนก่อน +6

    This channel is a hidden gem :O

  • @mullanalle4318
    @mullanalle4318 3 หลายเดือนก่อน

    Really well done video. Thank you!

  • @Иван-я1р5т
    @Иван-я1р5т 4 หลายเดือนก่อน +1

    That was quite interesting!
    Also does anyone knows the song at 10:35?

  • @Neptoid
    @Neptoid 4 หลายเดือนก่อน +1

    What about AI? It predicts the probability of the next tokens, why not the Hofphman thingy given those priors

    • @yuantan9292
      @yuantan9292 3 หลายเดือนก่อน

      The probability prediction and the encoding method are two separate steps, and the video just focused on the latter and not the former.
      In most of the video, the probability is fixed (or at least known) and independent, so there's no need to predict anything.
      In benchmarks using real world data, AI is indeed used for the probability prediction; for example, the top 23 (highest compression ratio, lowest file size) compressors of the enwik9 dataset all used some sort of AI and the current best (nncp v3.2) used a Transformer model.
      In real life applications though, AI is too slow compared to simple, non-AI predictors, so the smaller file size is hardly worth the extra time and power cost. While nncp v3.2 can compress enwiki9 to 49% of that of zstd (zstd 0.6.0 level 22 ultra), it takes 9.6x as much memory and 683x as much time to compress and then decompress the dataset.

  • @flameofthephoenix8395
    @flameofthephoenix8395 4 หลายเดือนก่อน

    0:28 While of course it is stored as a number, any piece of data on a computer is a series of 0 or 1 bits, there are not themselves numbers, but any series of them can also be represented by a number.

  • @notu483
    @notu483 4 หลายเดือนก่อน +1

    3:10 Why is there a factor of 1/N if it approaches infinity? Won’t that just lead to 0

    • @jacobschweiger5897
      @jacobschweiger5897 4 หลายเดือนก่อน +2

      1/n approaches 0 as n goes to infinity but the sum of the series of 1/n diverges as n goes to infinity.

    • @dimihi1173
      @dimihi1173 4 หลายเดือนก่อน +1

      1/N represents the dividing by N involved when calculating the average(mean)

    • @asdfqwerty14587
      @asdfqwerty14587 4 หลายเดือนก่อน

      I don't really know what all the terms/notation are in that equation so I can't really analyze it properly, but probably the sum is an infinite value, so it becomes a 0*infinity indeterminate form.

    • @vedantneema
      @vedantneema 4 หลายเดือนก่อน

      The summation scales up linearly as N increases so it'll just diverge to infinity. If your message is twice as long, on average you'll need twice as many bits to encode it. So, the 1/N is there to give us a more useful number to measure coding efficiency.

  • @lnee
    @lnee 3 หลายเดือนก่อน

    I realize I came up with the arithmetic numbering system algorithm when I was 14 while creating a chess compressor. Literally that algorithm.

  • @joshuabrown4952
    @joshuabrown4952 4 หลายเดือนก่อน +3

    I hope some google/amazon data centre engineers are taking notes, if they can hear the video over the water pumps out there in the Nevada desert.

  • @michaelbuchholz992
    @michaelbuchholz992 3 หลายเดือนก่อน

    I first heared of this about 30 years ago and i was so excited that i wrote it in C.
    I can acknowledge, it's compressing better than huffmann, but twice? Maybe with very specific data.
    But it was quite slow. Most probably because i wrote the adaptive version.
    It should be written in assembly, because it has some very special needs that no higher programming language provides. And writing a workaround to get that missing feature in C is a pain in the afternoon and also slow.
    Maybe using libgnump will be helpfull, if you don't want to use assembly - libgnump is written in assembly, as far as i know.

  • @BGBTech
    @BGBTech 3 หลายเดือนก่อน

    A downside with a lot of these is that they are slow: Arithmetic coding = Very slow, Bitwise Range Coding = Rather slow (but faster than traditional arithmetic coding), LZMA / 7zip uses this. Huffman can be made semi-fast (and a little faster if one limits symbols to 12 or 13 bits vs 15 or 16; with a lower limit the decoder's lookup table can fit into the L1 cache).
    Rice coding can also be made fast (more so when a similar length-limited is applied), but only works well with particular probability distributions; but can be turned into an adaptive scheme with relatively little added cost. One trick I have used is to combine Adaptive Rice with a "swap towards front" table (each symbol is encoded as an index into a table, and when encoded, the symbol at index J is swapped with the symbol at index (J*15)/16, which is used the next time this symbol is encoded). This scheme is adaptive, can be almost as fast as Huffman, and almost as good (in terms of compression) while needing less memory footprint (sometimes useful) and less code (also sometimes useful), and (unlike Huffman) the decoder lookup tables can be made read-only. But, a lot depends on what one needs...

  • @CSMHD
    @CSMHD 4 หลายเดือนก่อน

    thank you great insight - good explanation - there were some wow moments - thanks

  • @isidor37
    @isidor37 3 หลายเดือนก่อน +2

    This is an interesting topic and discussion, but the presentation and explanations were really hard to follow. The music was *mostly* unobtrusive, but there were a couple points where it really came into the foreground due to the tonal content and became distracting

  • @marcfruchtman9473
    @marcfruchtman9473 4 หลายเดือนก่อน

    It is important to understand that the Huffman Encoding requires that it build a "dictionary" (Huffman Tree) with the encoding and that must be included in the file so it increases the size of the compressed file. Additionally, the "encoding" used by Huffman is "Prefix-Free", meaning that no symbolic encoding can be a prefix for any other encoding used in the compression dictionary. It might have been one of the reasons why people were having trouble following. Maybe add the tree and prefix free quality in your next revision.

  • @lainwired3946
    @lainwired3946 4 หลายเดือนก่อน +1

    Not finished the video, but are they as efficient at decoding? I realise theres a lot of overheads on modern devices, but decoding and rendering images happens so often in a normal user session, it does make some sense to trade minimal disk space in the large picture for faster decoding. Just an idea on another reason they may not have moved on either?

    • @arxalier2956
      @arxalier2956 4 หลายเดือนก่อน +2

      Even if its slower to decode, just like zip files take a while to open, it would be worth doing it. Transferring compressed data through a network is worth the cost of computation and time it takes to encode/decode at the sender or receivers end, because transferring uncompressed data takes even more time and money than that

    • @lainwired3946
      @lainwired3946 4 หลายเดือนก่อน

      @@arxalier2956 not all images get transferred through networks though and webmasters have to make tradeoffs between optimization on their end and load speed on the users end. You lose something like 50% of your visits if the page still is loading after 3-5 seconds

    • @uis246
      @uis246 4 หลายเดือนก่อน +1

      ANS is super fast at decoding. Especially tANS.

    • @lainwired3946
      @lainwired3946 4 หลายเดือนก่อน

      @@uis246 faster than Huffman? If so, interesting. Was just an idea that came to mind.

    • @uis246
      @uis246 4 หลายเดือนก่อน +2

      @@lainwired3946 oh. I thought you was asking ANS relative to arithmetic coding. Derp. Arithmetic coding is slower than Huffman and ANS I think is faster than Huffman.

  • @Seedwreck
    @Seedwreck 2 หลายเดือนก่อน +1

    I recommend anybody using this method to also implement try to implement more layers, compressing with algorithms and compressing neighbour pattern delimiters with some form of character.
    This will greatly improve the compression and complexity of your data!

  • @WiseWeeabo
    @WiseWeeabo 4 หลายเดือนก่อน +15

    literally anything can be interpreted as "a single number"..

    • @ZeroPlayerGame
      @ZeroPlayerGame 4 หลายเดือนก่อน

      that's pretty much the point of the video yea

    • @Mnnvint
      @Mnnvint 3 หลายเดือนก่อน

      @@ZeroPlayerGame Find the probability of everything, and interpret "1" as the most probable thing, "2" as the second most probable thing, etc. Voila, optimal compression! there may be some slight implementation details to be worked out...
      Seriously though, one thing I found out many years ago which stuck with me, is that most compression algorithms don't use the whole number line this way. In fact, they only use a tiny fraction of the numbers on the number line. Case in point: A random file is practically never a valid zip file, not even if you strip away headers and other format bookkeeping stuff.
      Even arithmetic encoding and ANS (as described here) leave numbers on the table, so to speak. Mostly due to end handling. Defining a bijective compression scheme (where every sequence of bits decodes to just one message, and all messages can be encoded) is fiddly and most compression people don't see the point. But I think such algorithms are beautiful.

    • @ZeroPlayerGame
      @ZeroPlayerGame 3 หลายเดือนก่อน +2

      @@Mnnvint what's "2"? What's your alphabet? Huffman is bijective. Arithmetic coding is bijective if you encode a whole number of bits (might not be the case). The number line is a mathematical abstraction and doesn't exist in the physical world where encodings have to live.

    • @Mnnvint
      @Mnnvint 3 หลายเดือนก่อน

      @@ZeroPlayerGame Find the probability of the most likely message, and assign it to "1" on the number line. Find the second most likely message, and assign it to "2". There is no alphabet, we deal with the whole message at once.
      "Encodings" are a mathematical abstraction too, so they certainly don't have to live in the physical world if numbers don't :) Personally I like mapping things to the natural numbers/positive integers.
      I specifically said arithmetic encoding is not bijective *as described in this video*. There exist bijective arithmetic encoding implementations. There even exist bijective variants of Huffman, though that should probably be considered a variant and not just a different implementation.
      Bijective compression has a special meaning. It doesn't just mean that there's a bijection between messages and encodings, that's a much lower bar (although most compression methods don't meet that either). Bijective compression means that in addition to satisfying this weaker constraint, every sequence of bytes is a valid file in the compression format. There can be no "unexpected end of file" or other format errors, any sequence of bytes has to decode to a unique sequence of bytes.

  • @Fine_Mouche
    @Fine_Mouche 4 หลายเดือนก่อน

    what about the fidelity of the image ?

  • @Rudxain
    @Rudxain 4 หลายเดือนก่อน

    I wonder if P-adic numbers could help improve these algorithms

  • @RikAllen-b1c
    @RikAllen-b1c 3 หลายเดือนก่อน

    Erm, pretty sure I’m watching the video encoded with arithmetic coding. It’s the basis of h.265 and h.264 video codecs, s as long with their still image profiles.
    H.264 baseline profile is closer to Huffman, but the main profiles expect to use context adaptive binary arithmetic coding.

  • @teltubes1
    @teltubes1 4 หลายเดือนก่อน

    Great animations. I understood nothing about the proof but the examples are nice. Is there a specific software you are using for this presentation?

  • @MegaArti2000
    @MegaArti2000 2 หลายเดือนก่อน

    Nice video! Just one suggestion: don't let the video black-out for so long while you're talking. I was confused thinking something was wrong with the image, because you were still talking regardless hahaha

  • @automatescellulaires8543
    @automatescellulaires8543 2 หลายเดือนก่อน

    is there any legal risk to bake the ideas pointed at in this video into a commercial product ?

  • @sasjadevries
    @sasjadevries 2 หลายเดือนก่อน

    Interesting topic, good explanation, nice animations, and a special thanks to Frédéric Chopin.
    TH-cam, is as bad as the US patent office, and doesn't recognise all authors of the music. YT didn't recognise Satie either.

  • @apricotcomputers3943
    @apricotcomputers3943 4 หลายเดือนก่อน

    blipblipblirip, you're jobs taken😂

  • @The_Pariah
    @The_Pariah 3 หลายเดือนก่อน

    Middle out?
    Do we have a new contender for TechCrunch Disrupt??

  • @flameofthephoenix8395
    @flameofthephoenix8395 4 หลายเดือนก่อน

    13:13 I'm not a big fan of the probability part of this but I suppose it is likely a good thing overall seeing as it is beneficial to your average image or text. Unless the entire thousand letter long text document is just repeated "As" you'll have no problem.

  • @timmygilbert4102
    @timmygilbert4102 3 หลายเดือนก่อน

    What about markov compression? 😮

  • @xdasdaasdasd4787
    @xdasdaasdasd4787 2 หลายเดือนก่อน

    I got lost in the first 10 minutes haha, all the math killed me

  • @NorthWay_no
    @NorthWay_no 4 หลายเดือนก่อน

    Some day I will grok how you output a Huffman file with the weights included, or possibly how they are inferred from an adaptive version - and not taking up masses of space, relatively, for small files.

  • @pistachos4868
    @pistachos4868 4 หลายเดือนก่อน +7

    1:12 azumanga daioh reference😺

    • @JesusPlsSaveMe
      @JesusPlsSaveMe 4 หลายเดือนก่อน +2

      Revelation 3:20
      Behold, I stand at the door, and knock: if any man hear my voice, and open the door, I will come in to him, and will sup with him, and he with me.
      HEY THERE 🤗 JESUS IS CALLING YOU TODAY. Turn away from your sins, confess, forsake them and live the victorious life. God bless.
      Revelation 22:12-14
      And, behold, I come quickly; and my reward is with me, to give every man according as his work shall be.
      I am Alpha and Omega, the beginning and the end, the first and the last.
      Blessed are they that do his commandments, that they may have right to the tree of life, and may enter in through the gates into the city.

    • @MadocComadrin
      @MadocComadrin 4 หลายเดือนก่อน +1

      Oh my Gah!

  • @RandomGeometryDashStuff
    @RandomGeometryDashStuff 4 หลายเดือนก่อน

    07:41 but now 0."10" and 0."100" are duplicates

  • @juancarlospizarromendez3954
    @juancarlospizarromendez3954 4 หลายเดือนก่อน

    I suggest it for the research, to use nibbles 0..9A-F instead of decimal digits or binary digits (bytes as digits are too symbols).

  • @galewallblanco8184
    @galewallblanco8184 4 หลายเดือนก่อน +1

    "this is one of the way your jpegs [...] compress data without losing information"
    jpeg: lossless compression
    jpeg does lose information, but it was optimized to not lose too much *meaningful* perceptible information (that is until you zoom in)

    • @beardymcbeardface69
      @beardymcbeardface69 3 หลายเดือนก่อน

      JPEG uses both lossy and lossless compression. The lossy compression first, then lossless compression of that as a last step. Obviously with the final result being lossy.
      The lossless last step is a variant of Huffman.
      So to say that JPEG compresses data without losing information is a half truth. Badly worded on his part, I'd say, without clarifying that it uses both and the final outcome is indeed lossy.

  • @Siroitin
    @Siroitin 2 หลายเดือนก่อน

    Huffman also accepts that "01" can be multiple characters like "you"?
    Also H(X) is sometimes called Shannon's measurement of Information (SMI)

  • @jan-pi-ala-suli
    @jan-pi-ala-suli 2 หลายเดือนก่อน

    “i reject your rejection”

  • @romanhimmes171
    @romanhimmes171 4 หลายเดือนก่อน +1

    I thought that x.264 uses arithmetic compression.

  • @DB-Barrelmaker
    @DB-Barrelmaker 3 หลายเดือนก่อน

    I almost understood this but there was a damn plane kept flying over my head

  • @puppergump4117
    @puppergump4117 4 หลายเดือนก่อน

    I think you'd make a great youtuber, depending on how long it took you to make this. If you keep the style and make some in-depth videos on things where online resources suck donkey like shaders or common library implementations you'd probably hit 100k before the year ends.

  • @_soundwave_
    @_soundwave_ 4 หลายเดือนก่อน

    What if someone discovers these patented algorithms? Can they still be sued ? Are they legally allowed to use it?

  • @uis246
    @uis246 4 หลายเดือนก่อน +5

    You made Jarek happy. He looks pleased in JXL discord server.

  • @Kynatosh
    @Kynatosh 4 หลายเดือนก่อน +2

    Before 11:11, I was like great, this is cool but doesn't seem that helpful
    After that simple change, everything makes so much sense

  • @blacky7801
    @blacky7801 4 หลายเดือนก่อน +2

    Just do a binary search on the list of all words? brilliant!

  • @abd_cheese7353
    @abd_cheese7353 4 หลายเดือนก่อน +5

    This channel is boutta blow up