TO CLARIFY: don't be afraid to use these algorithms! the patents for arithmetic coding are all expired, and the only justifiably patentable material in Microsoft's patent are specific implementations of adaptive-probability rANS (which JPEG XL actually doesn't use). Microsoft has also given anyone permission to use it in open source, royalty-free codecs at around 7:20, i put "NH(X) = log_2 X" ... H(X) is the entropy of a symbol, not of the code! X on the left is a symbol, and X on the right is the code ... sorry! let me know of any other corrections. and thank you for watching!! I saw some mixed reactions to the music. I made a poll about this for future reference, but I don't know how to link it here, so please check it out in my channel
for the purpose of encoding files even the full ASCII set wouldn't either of these methods fall down because instead of being base 10 we are base 255 converted into base 2. granted the the opcode values would have very low occurrence for a probabilistic approach (unless the files contains assembly) then how do these algorithms deal with character sets like UTF16 or UTF32? wouldn't that imply that even the arithmetic approach would need a given char value to be 32 bits at which point we are in almost non compressed state?
@@gardian06_85 UTF-8, UTF-16, and UTF-32 are all encodings for the same character set. So you could compress a UTF-32 file by encoding it as UTF-8 and then using one of these methods. You could also use 64-bit integers, or even greater bases. But most compression formats don't encode bytes directly. They encode symbols that are instructions to the decoder, like a little machine. "Copy the next three bytes to the output stream." "Look back 40 bytes in the output and copy 25 again."
You can also get the code from the h264 or hevc reference encoder (which both should only use expired patents by now) for how they handle the variable probabilities and how they implement it. Basically there's a bunch of bins that store probabilities and they have parameters for a starting probability and how much it should be updated each time.
Why the hell can you patent something from the public domain? It should never be enforceable, because anyone can claim they obtained it from the original public domain source itself.
@@KillFrenzy96 They didn't patent it directly, they patented some straightforward expansions and optimizations that virtually everyone would need to use in practice.
Patents suffer from the problem of trivial extension. The typical thing I have noticed is that you can take a well known thing, not change it at all, and use it in a context it hasn't really been used before. Then patent that. I can't patent Huffman Coding? What about Huffman Coding... in space?
@@TreeLuvBurdpu tell that to the patent officer that approved some of the patents I have been directly or indirectly involved in. In theory, completely agree with you. In my experience that is not the case in practice, and the concrete application is more important. I haven't read the actual patent myself, but I found out one of the teams I worked with successfully got a patent for using CNNs for image classification in a certain medical context. They didn't invent using CNNs for image classification.
Part of the problem is that the standards of whether something can be patented or not is completely variable. In theory it has to be original and demonstrate an inventive step - something not obvious, something the inventor thought of or did. So Microsoft taking Huffman encoding and simply applying it to a few common critical structures and patenting that should not pass the inventive step threshold, because it's an obvious application that anyone using Huffman encoding would do. Patent law, especially in the software space, needs real reform somehow.
@@thisisabsolutelystup The huffman encoding in space example wasn't real. I was referencing the "recycled in space" trope. Maybe I am too chronically online.
There is a whole book of these and even that just barely scratches the surface. Look up Math You Can't Use by Ben Klemens. It's over 2 decades old now so I'm sure we could easily write several new editions of it with newer patent BS like this one.
This a neat video, and an interesting subject, however, it was a bit too fast paced for me. A few could be done differently : - Sometimes, you list goals, but do not put those in written format, which makes things harder to follow. - Having a synthetic representation of the algorithm as a whole could also help a lot, seeing the states of the variables evolve in real time is neat, but insufficient. - Getting the meaning of abstract notations takes some time, using a more explicit naming convention could help.
I agree. I couldn't follow along as well as I'd have liked. There were so many symbols with single letter names and mentions of technical terms without accompanying visualizations. In an ideal world where the editor never gets tired and never has to take breaks, every variable in the equation must constantly have its definition somewhere on-screen. either as text or as an image explaining what it is. But I do really appreciate the fact that the video exists at all. It's a lot more digestible than... papers.
It's actually being used in JPEG XL. The problem is so few programs support it despite its backing from every major web browser, photography company, open source organisation, and cybersecurity group. Cuts the size of PNGs in half in most cases while still retaining lossless quality.
@@keisisqrl Google killed JPEG XL in favor of AVIF, not WebP. WebP is no longer being developed. Mozilla did not kill off JXL either, it is still available as a nightly flag. And Apple has full support for JXL across all their OSs including WebKit.
6:13 Took me a second to figure out how clever this was. Assuming you're playing a game of hangman in which the word to be guessed is an actual, English word, it could be plane, plant, plank, or plans. In that case, saying that the last letter is a vowel is worth two bits of information (as it divides the space of valid possibilities by 2^ *2* ). Saying that the last letter is a letter is worth zero bits, given assumptions.
jarek duda: check out this really cool compression algorithm I made! it’s free and unpatented and twice as good as Huffman coding. Microsoft: check out this really cool compression algorithm I totally didn’t steal from jarek duda! it’s proprietary and patented and twice as good as Huffman coding. patent office: yeah, we don’t have it on file. Here’s your patent, Microsoft jarek duda:
I'm of the opinion that software shouldn't be patentable, but also that copyright should expire after no more than 28 years, although 14 would be preferable. Years ago, back when I was in middle school, I remember playing with a compression program written in QBasic which used arithmetic coding. I thought it was pretty cool, but at the time didn't know how to use it. That was significantly longer than 17 years ago now, and even if the idea were capable of being patented, it definitely should no longer be as 17 years is the limit for a patent. Of course, I also think that our patent system needs to be fixed in more ways than one, including that people with technical knowledge should be the ones to review patents in technical fields and the law, which gives us the criteria that they should be novel, should actually be applied and prevent patent squatting as MS, and many other corporations acting in bad faith, have done.
No, we need just remove patent laws. There will be inventinve AI that can solve "on flight". There is no reason to give right for ideas to the first organization that will run the search.
My data compression professor last semester made really good points about how Huffman still has a place. It’s just a substitution thing and you can simply read the encoded text if you have the scheme. It’s a lot more performant when it comes to decoding.
I love asymmetric compression schemes personally, but the wiki claims that fractal compression has largely been abandoned because it became clear that fractal compression would always be very asymmetric.
@@i-am-linja I thought prior art would be enough, and maybe it's still enough and the patent for ANS wouldn't really hold in court, who knows. Patents are not enforced in my jurisdiction, they can't. However we live in a global world and using US patented algorithms would prevent me from publishing my software in the US.
Thank you so much for sharing this! I was implementing LZMA earlier this year (for fun, and to learn how it works), and when I got to LZMA's range encoding component I realized how genius this approach is. It took me days to understand, but your video would've helped so much if I was around haha
We need another edition of Math You Can't Use by Ben Clemens just to cover this BS. Well, I mean we probably need a reconsideration of the patent regime first. Software - being just maths - should *not* be granted patents at all. If the whole idea behind patents was to foster innovation then "we've been sending completely unnecessary amounts of data over the internet because patent trolls" is definitely evidence that it doesn't work that way.
Most of the time software cannot be simply patented, however source code is covered by copywrite. They are not the same thing. There are a few exceptions but you need to dig into the specific cases to have any meaningful discussion of that area.
@@mytech6779 copyright doesn't matter here since we are talking about patents. Copyright also only applies to the specific implementation - or expression - of an idea. Therefore although there are problems with copyright as well it doesn't at least give someone a complete right over an idea.
I remember seeing a demo for fractal encoding instead of JPG in the early 1990's in the days where 14.4kbps modems were still very expensive so I only had 2400 baud dialup. The demo was a real-estate website with pictures of a houses. Interlaced jpg images took a few minutes to download and view, where as the new fractal format was
Even with today's internet speeds, I sometimes end up having to deal with ridiculously slow connections, ranging from minutes to hours. Genuinely unable to look at pictures sent via a specific service because it loads 10 rows of pixels per second. Compression like that should be a priority for all online services, and there is no excuse for CPU load being high, considering how stupidly fast they are now. Oh well, how else are they gonna get us to pay more for "faster" internet?
Saw some comments complaining about the music, and just wanted to say that I disagree completely. Having some music is nice, and the choices you made are very low energy and out of the way anyways.
I think its fine as well, though it would be nice to have an option on yourube to turn off the bg music if you have your own prefered soundtrack, though the video would need to made withouth merging the soundtracks making the file a bit bigger😅 I dont think that should be a huge isue🤔
Slight gripe, at 9:00 it was a bit confusing that you described the operations as "outputting a bit representing which half we're in" and "centering and renormalizing" instead of just saying we want to find a number within the range, that halving is just how decimal binary works, and briefly pointing out why the "centering and adding 0s" works. Otherwise interesting video, and cool subject, thanks!
I can see the intention of the video But what he doesn't realize is that thinking in whole probability is not functional, that's not how machine works and it's ultimately extremely inefficiency. It's a classic "looks good on paper" but in practice you are trying to force something that doesn't fit
Took me some time wonderi'g what he was getting at but it works I guess I understood at the end of his explanation that it was just writing the number in base 2
@@k1ll3trs86I don't understand what you're trying to say. This is not only good on paper but good in practice too. It is literally strictly better than huffman and huffman is used everywhere.
@@anonymous-q2b5s I think what's meant is that keeping the whole real number in memory and then writing it out in binary is a bad idea (because doing math on long real numbers is a bad idea).
There is a FUSE file system called PiFS. It can store data using some sort of arithmetic coding where the resulting number itself gets compressed as an offset + length into the unlimited digits of Pi.
What a wonderful presentation and justification of ANS you have posted. I loved watching it, although speech is a bit faster, I used .75 speed 😊. I would reccomend you create more videos explaining other compression algorithms too, which in fact have arithmetic roots. Highly recommend and highly appreciated dear ! Onw video I would want can be on different methods of approximation, and second on almost all, mostly forgotten bitcodes generation methods in 1 go. Also error correction, hamming codes, reed Soliman codes too , etc. Looking forward to more content ❤
I stumbled upon ANS for the first time in this video, so thanks, but I didn't understanding it at all. I had to watch Stanford's EE 274 on the subject, where it seemed ... approachable, at least. Not that a video of this length was going to fully explain such fairly advanced topics, the pace is quite fast, but it was an excellent introduction to the subject, and it motivated me for further research, so goal achieved I guess. Also the effort in making it really shows.
I got a 5 on AP stats and I'm pretty lost lol. This video is less stats heavy, at least the stuff you learn in AP, and more computer science jargon heavy. It also just moves uncomfortably fast for the stuff I have some handle on.
I am usually not in the "wow this is so cool but I didn't understand anything" group, but I think I might be with this vid. The video is paced pretty fast, if I wanted I would probably rewatch it. Currently I just feel like falling into a rabbit hole (which happens to me quite a bit) but not keeping up with the falling.
Haven’t watched yet but there’s a neat thing: arithmetic coding, or a closely related thing, allows one to convert a sequence of fair N-sided coin outcomes to any other discrete uniform distribution without rejection, though if you don’t want to reject you’ll need unbounded memory, and if you want to flush time to time, you’re effectively rejecting some information you’re getting from the initial sequence. But at least you can fine-tune how much you want to reject/save and change it over time! Maybe it’s already in the video but if it’s not, here. Also it’s as if there’s some duality about entropy generated by rejection, non-uniformness introduced if we decide to reject only a given number of times and then be complacent, working memory etc.. I don’t know in what kind of inequality this can be written but I hope information/coding theorists have it already discovered and generalized similarly to what happened to the uncertainty principle (in QM and Fourier analysis-now there exists a general operator algebra approach IIRC).
For the folks interested in functional programming or calculating correct programs, Richard Bird's "Pearls of Functional Algorithm Design" has two chapters on arithmetic coding, one on the rational number variant and one on the integer variant.
Arithmetic coding is widely used. It was the only option for some standards, even before the patents expired. So, it got used, patents or not. I've had to implement it for things like JBIG coding for FAX, where its specifically there to improve on the huffman coding in the previous compression methods used for FAX.
There is a technical and a business reason. The technical is that HEIC - like the rival AVIF - goes far beyond just arithmetic compression. It's really an adaptation of techniques developed for video compression, and will wipe the floor with JPEG, even arithmetic-compressed JPEG. The business reason is that HEIC, being based on the h265 video codec, is a heavily patented format - and one of the contributors to that patent pool happens to be Apple. Not only do they get to use it at a reduced licencing cost, but whenever another company wants to use HEIC, Apple gets a cut of the royalties. It's clear that JPEG is going to decline, and there are a number of formats competing to replace it - it's in Apple's best interests to make sure that the format they stand to profit from is the one to achieve dominance, and making HEIC the default format for taking photos on iPhones and iPads gives it a huge advantage - every photo-sharing service needs to at least support uploading HEIC, or else Apple users will find their photos won't share. There's a rival format, AVIF, which is supposed to be patent-free (you can never be sure) and pushed by a rival consortium, with Google as the major backer. They aren't trying to score any patent money, but that doesn't mean it's entirely altruistic: Their aim is to promote an open and unpatented format in order to avoid having to pay up royalties to the likes of Apple.
I'll be honest it was hard to follow, maybe I'm just dumb 😅 The music was fine imo, but if people complain you can "duck" it (the volume lowers on speech).
no you're not dumb. This video seemed to be made for an audience that already knows compression algorithms and information theory at a competent level.
@@jeffreychandler8418 Not even, I'm pretty familiar with many of the concepts but he's just running through the ideas with no space to breathe or re-emphasis on the critical aspects.
@@ArnaldurBjarnason That's validating to hear xD I usually ascribe misunderstandings as a personal issue than a communicator issue (unless it's REALLY bad like the Zion NPS instagram or something like that). As I think about this video again, honestly a two minute succint summary of existing algos would be great, then trim the fat on the new algos to make those simpler, and allow more breathing space.
I actually used both algorithms in my lossless video codec - ScreenPressor. Initially it was range coding (arithmetic coding with integers) but it involved division which isn't very fast. Then later version used rANS which is faster but requires compressing data in reverse order. The end result was pretty great: best in class compression and good speed (when the content is not too complicated, as in typical screen data).
I'm stuck on the variable length encoding in your explanation. "Shorter codes encode more frequent symbols". How do you indicate the length of bits that should be decoded into a single symbol? Presumably, short codes would exist inside longer encodings, for instance "101” is inside " 1011”, and "10” is inside ”101”.
In the Huffman algorithm, the codes used are designed as such that no shorter code is inside a longer code. So in the example you've provided, 1011 would mean you have the symbol "101" and then you have another symbol starting with "1". The way huffman is usually taught in school, this involves following a tree, where 1 is right and 0 is left, and the decoder traverses that tree with each bit you read until you eventually hit a symbol, output that symbol, and then go back to the top of the tree. with that explanation, the symbol encoded for 101 would mean you went right, left, right in the tree, hit a symbol, and went back to the top. Hopefully this explanation is easy to understand, it would be better with diagrams but those are hard to make in youtube comments.
By the way a coding algorithm where this property holds (no word is prefix of another one) is called a prefix code. Iirc from reading CLRS, a non-prefix code can always be transformed into a prefix-code, so usually textbooks focus on prefix-code. Though I have very few knowledge about codes so I might be wrong.
Imagine leaving your hard work public for the sake of humanity and greedy corporation takes credit for it and ends up forgetting it in their patent drawer
I'm pointing at YOU, database users! You should know the data types and use them accordingly. You don't need a string to store one of 3 states, use an enum. You don't need an int for a number between 0 and 100, use a byte. It's 4x smaller. You should always use unsigned if you don't need negatives. It doubles the positive range. Also use decimal, it can store decimal places more precisely than float. If you don't have a horrendously huge range, just use it.
@@dadamaldasstuff1816wdym? A string can be two bytes of it holds just one char (and the null char appended). An enum is in most languages an integer, which can be 32 or 64 bytes and use up more space. I'm sure the db stores it at one byte though I still agree that you should always use the right type though. Even helps you catch bugs early when you get screamed at for having the wrong type
It would be nice if the video with a title about halving image file sizes actually discussed halving image file sizes. Can you explain why this is specifically good for image compression? Are there any actual implementations of arithmetic encoded images and how do they compare to huffman encoded ones, in terms of filesize? How did you arrive at the 50% improvement value? JPEG-XL, which is the only format you mentioned that kinda used the ideas presented in this video, only has a guaranteed file savings of about 20% when using the JPEG Re-encoding feature on an image that was already a JPEG. How does this raw data stream compression algorithm compare to the image-specific compression algorithms used in various formats?
Quite interesting subject, I didn't treat probability as something related to my interest field, until now. A year ago I started writing my own lossy audio codec, which is done for now, but I didn't implement any kind of entropy coding. I tried Huffman, RLE, but only thing I got was bigger file size and increased complexity. The problem with these algorithms is that there must be a defined set of symbols and a strong pattern in encoded message, otherwise it won't work. Using raw binary sequences as symbols doesn't really work, since there is quite a lot of them and probabilities for each other sorted in decreasing order are far away from shape of 1/x function. But the only pattern I saw were groups of 0's with less 1's in between. I still can't understand how MP3 pre-processes values so that they are suitable for entropy coding. By saying values I mean quantized frequency coefficients after transform from time to frequency domain, MDCT in my case. After psychoacoustic analyzis I know which coefficients can be discarded and what precision to use for the rest of them. Coefficients are divided into subbands with fixed size, let's say 16, and then they are quantized with different precisions. Single frame can have subbands quantized with both 2 bits and 16 bits, although now I know that 16 bits was way too much even for almost lossless compression. Each subband has its scalefactor field and a precison field, saying how many bits are used per coefficient and what scalefactor was used to divide values to quantize them. Quantized values can be basically anything, I didn't see any strong pattern there. From what I heard, MP3 uses more or less fixed precisions for coefficients, and maybe that helps to entropy code them. But in my codec everything is dynamic and bitrate can vary from for example 20 kbps to 400 kbps for the same file and same quality setting, 20 kbps would be used to encode a single tone, while 400 kbps would be needed for a drum hit. Even quantized values are not fixed-length, for example with 3-bit precision, 0 would be just 0 in binary, 1 would be 100, 2 is 101, -1 is 110, -2 is 111. Scalefactors and precison fields also have variable lengths and are differential-coded. There are also magic values for those fields which can specify number of empty subbands, so few tens of empty coefficients can be stored on a few bits. And these aren't all clever mechanisms that I used instead of entropy coding. But I'd like to understand how to implement this coding for audio / how to prepare values for that coding. With that and some other changes I could create even better codec, which already outperforms MP3 a bit. It's called DAC (Dynamic Audio Codec), and I have a video about it. Unfortunately in polish, but today it shouldn't be a problem. I also shared the working codec (browser it all you need to run it), so take a look if you want. Anyways, thanks for the video and I'm waiting for the next one!
Thank you this helps me to understand arithmetic encoding and ANS. Both (+b Huffmann) assume the symbols are independant. Could Markov chains improve compression rates, or techniques based on repeated occurrences makes Markov chains irrelevant ?
Great manim animations. It’s a shame ANS isn’t as widely used. There are similar patent issues with lossy compression algorithms as well, but those are even more complex since there are two vectors of optimization (compression and reconstruction accuracy) where the second does not have a concrete way to measure it.
There are some synthetic benchmarks, like 7zip used to show programs compressed with zip vs 7zip. A standardised benchmark may be useful, though compressors could overfit for that benchmark and look better than they really are
@@Aera223 well zip and 7zip are still lossless, but as for lossy a benchmark doesn’t exist for that reason. Some of the reconstruction quality will be subjective. PSNR vs Compression Rate is the most common metric but PSNR can be high while having a very noisy output, or a singular spike of error, as long as it averages out. Some situations lossy compression will have rules for the application such as not deviating more than X amount, or always producing smoothed results
The probability prediction and the encoding method are two separate steps, and the video just focused on the latter and not the former. In most of the video, the probability is fixed (or at least known) and independent, so there's no need to predict anything. In benchmarks using real world data, AI is indeed used for the probability prediction; for example, the top 23 (highest compression ratio, lowest file size) compressors of the enwik9 dataset all used some sort of AI and the current best (nncp v3.2) used a Transformer model. In real life applications though, AI is too slow compared to simple, non-AI predictors, so the smaller file size is hardly worth the extra time and power cost. While nncp v3.2 can compress enwiki9 to 49% of that of zstd (zstd 0.6.0 level 22 ultra), it takes 9.6x as much memory and 683x as much time to compress and then decompress the dataset.
0:28 While of course it is stored as a number, any piece of data on a computer is a series of 0 or 1 bits, there are not themselves numbers, but any series of them can also be represented by a number.
I don't really know what all the terms/notation are in that equation so I can't really analyze it properly, but probably the sum is an infinite value, so it becomes a 0*infinity indeterminate form.
The summation scales up linearly as N increases so it'll just diverge to infinity. If your message is twice as long, on average you'll need twice as many bits to encode it. So, the 1/N is there to give us a more useful number to measure coding efficiency.
I first heared of this about 30 years ago and i was so excited that i wrote it in C. I can acknowledge, it's compressing better than huffmann, but twice? Maybe with very specific data. But it was quite slow. Most probably because i wrote the adaptive version. It should be written in assembly, because it has some very special needs that no higher programming language provides. And writing a workaround to get that missing feature in C is a pain in the afternoon and also slow. Maybe using libgnump will be helpfull, if you don't want to use assembly - libgnump is written in assembly, as far as i know.
A downside with a lot of these is that they are slow: Arithmetic coding = Very slow, Bitwise Range Coding = Rather slow (but faster than traditional arithmetic coding), LZMA / 7zip uses this. Huffman can be made semi-fast (and a little faster if one limits symbols to 12 or 13 bits vs 15 or 16; with a lower limit the decoder's lookup table can fit into the L1 cache). Rice coding can also be made fast (more so when a similar length-limited is applied), but only works well with particular probability distributions; but can be turned into an adaptive scheme with relatively little added cost. One trick I have used is to combine Adaptive Rice with a "swap towards front" table (each symbol is encoded as an index into a table, and when encoded, the symbol at index J is swapped with the symbol at index (J*15)/16, which is used the next time this symbol is encoded). This scheme is adaptive, can be almost as fast as Huffman, and almost as good (in terms of compression) while needing less memory footprint (sometimes useful) and less code (also sometimes useful), and (unlike Huffman) the decoder lookup tables can be made read-only. But, a lot depends on what one needs...
This is an interesting topic and discussion, but the presentation and explanations were really hard to follow. The music was *mostly* unobtrusive, but there were a couple points where it really came into the foreground due to the tonal content and became distracting
It is important to understand that the Huffman Encoding requires that it build a "dictionary" (Huffman Tree) with the encoding and that must be included in the file so it increases the size of the compressed file. Additionally, the "encoding" used by Huffman is "Prefix-Free", meaning that no symbolic encoding can be a prefix for any other encoding used in the compression dictionary. It might have been one of the reasons why people were having trouble following. Maybe add the tree and prefix free quality in your next revision.
Not finished the video, but are they as efficient at decoding? I realise theres a lot of overheads on modern devices, but decoding and rendering images happens so often in a normal user session, it does make some sense to trade minimal disk space in the large picture for faster decoding. Just an idea on another reason they may not have moved on either?
Even if its slower to decode, just like zip files take a while to open, it would be worth doing it. Transferring compressed data through a network is worth the cost of computation and time it takes to encode/decode at the sender or receivers end, because transferring uncompressed data takes even more time and money than that
@@arxalier2956 not all images get transferred through networks though and webmasters have to make tradeoffs between optimization on their end and load speed on the users end. You lose something like 50% of your visits if the page still is loading after 3-5 seconds
@@lainwired3946 oh. I thought you was asking ANS relative to arithmetic coding. Derp. Arithmetic coding is slower than Huffman and ANS I think is faster than Huffman.
I recommend anybody using this method to also implement try to implement more layers, compressing with algorithms and compressing neighbour pattern delimiters with some form of character. This will greatly improve the compression and complexity of your data!
@@ZeroPlayerGame Find the probability of everything, and interpret "1" as the most probable thing, "2" as the second most probable thing, etc. Voila, optimal compression! there may be some slight implementation details to be worked out... Seriously though, one thing I found out many years ago which stuck with me, is that most compression algorithms don't use the whole number line this way. In fact, they only use a tiny fraction of the numbers on the number line. Case in point: A random file is practically never a valid zip file, not even if you strip away headers and other format bookkeeping stuff. Even arithmetic encoding and ANS (as described here) leave numbers on the table, so to speak. Mostly due to end handling. Defining a bijective compression scheme (where every sequence of bits decodes to just one message, and all messages can be encoded) is fiddly and most compression people don't see the point. But I think such algorithms are beautiful.
@@Mnnvint what's "2"? What's your alphabet? Huffman is bijective. Arithmetic coding is bijective if you encode a whole number of bits (might not be the case). The number line is a mathematical abstraction and doesn't exist in the physical world where encodings have to live.
@@ZeroPlayerGame Find the probability of the most likely message, and assign it to "1" on the number line. Find the second most likely message, and assign it to "2". There is no alphabet, we deal with the whole message at once. "Encodings" are a mathematical abstraction too, so they certainly don't have to live in the physical world if numbers don't :) Personally I like mapping things to the natural numbers/positive integers. I specifically said arithmetic encoding is not bijective *as described in this video*. There exist bijective arithmetic encoding implementations. There even exist bijective variants of Huffman, though that should probably be considered a variant and not just a different implementation. Bijective compression has a special meaning. It doesn't just mean that there's a bijection between messages and encodings, that's a much lower bar (although most compression methods don't meet that either). Bijective compression means that in addition to satisfying this weaker constraint, every sequence of bytes is a valid file in the compression format. There can be no "unexpected end of file" or other format errors, any sequence of bytes has to decode to a unique sequence of bytes.
Erm, pretty sure I’m watching the video encoded with arithmetic coding. It’s the basis of h.265 and h.264 video codecs, s as long with their still image profiles. H.264 baseline profile is closer to Huffman, but the main profiles expect to use context adaptive binary arithmetic coding.
Nice video! Just one suggestion: don't let the video black-out for so long while you're talking. I was confused thinking something was wrong with the image, because you were still talking regardless hahaha
Interesting topic, good explanation, nice animations, and a special thanks to Frédéric Chopin. TH-cam, is as bad as the US patent office, and doesn't recognise all authors of the music. YT didn't recognise Satie either.
13:13 I'm not a big fan of the probability part of this but I suppose it is likely a good thing overall seeing as it is beneficial to your average image or text. Unless the entire thousand letter long text document is just repeated "As" you'll have no problem.
Some day I will grok how you output a Huffman file with the weights included, or possibly how they are inferred from an adaptive version - and not taking up masses of space, relatively, for small files.
Revelation 3:20 Behold, I stand at the door, and knock: if any man hear my voice, and open the door, I will come in to him, and will sup with him, and he with me. HEY THERE 🤗 JESUS IS CALLING YOU TODAY. Turn away from your sins, confess, forsake them and live the victorious life. God bless. Revelation 22:12-14 And, behold, I come quickly; and my reward is with me, to give every man according as his work shall be. I am Alpha and Omega, the beginning and the end, the first and the last. Blessed are they that do his commandments, that they may have right to the tree of life, and may enter in through the gates into the city.
"this is one of the way your jpegs [...] compress data without losing information" jpeg: lossless compression jpeg does lose information, but it was optimized to not lose too much *meaningful* perceptible information (that is until you zoom in)
JPEG uses both lossy and lossless compression. The lossy compression first, then lossless compression of that as a last step. Obviously with the final result being lossy. The lossless last step is a variant of Huffman. So to say that JPEG compresses data without losing information is a half truth. Badly worded on his part, I'd say, without clarifying that it uses both and the final outcome is indeed lossy.
I think you'd make a great youtuber, depending on how long it took you to make this. If you keep the style and make some in-depth videos on things where online resources suck donkey like shaders or common library implementations you'd probably hit 100k before the year ends.
TO CLARIFY: don't be afraid to use these algorithms! the patents for arithmetic coding are all expired, and the only justifiably patentable material in Microsoft's patent are specific implementations of adaptive-probability rANS (which JPEG XL actually doesn't use). Microsoft has also given anyone permission to use it in open source, royalty-free codecs
at around 7:20, i put "NH(X) = log_2 X" ... H(X) is the entropy of a symbol, not of the code! X on the left is a symbol, and X on the right is the code ... sorry! let me know of any other corrections. and thank you for watching!!
I saw some mixed reactions to the music. I made a poll about this for future reference, but I don't know how to link it here, so please check it out in my channel
Please reupload it fixed.
for the purpose of encoding files even the full ASCII set wouldn't either of these methods fall down because instead of being base 10 we are base 255 converted into base 2. granted the the opcode values would have very low occurrence for a probabilistic approach (unless the files contains assembly) then how do these algorithms deal with character sets like UTF16 or UTF32? wouldn't that imply that even the arithmetic approach would need a given char value to be 32 bits at which point we are in almost non compressed state?
@@gardian06_85
UTF-8, UTF-16, and UTF-32 are all encodings for the same character set. So you could compress a UTF-32 file by encoding it as UTF-8 and then using one of these methods.
You could also use 64-bit integers, or even greater bases.
But most compression formats don't encode bytes directly. They encode symbols that are instructions to the decoder, like a little machine. "Copy the next three bytes to the output stream." "Look back 40 bytes in the output and copy 25 again."
please remove background music because it distracts. we can always listen to music while watching videos, but we cannot remove them from videos.
You can also get the code from the h264 or hevc reference encoder (which both should only use expired patents by now) for how they handle the variable probabilities and how they implement it. Basically there's a bunch of bins that store probabilities and they have parameters for a starting probability and how much it should be updated each time.
0:47 intellectual property law is a completely fair and balanced system with no exploits whatsoever
Why the hell can you patent something from the public domain? It should never be enforceable, because anyone can claim they obtained it from the original public domain source itself.
One of the most frustrating things I've ever seen
Spiffing Brit reference detected
@@Gelatinocyte2someone gets it
@@KillFrenzy96 They didn't patent it directly, they patented some straightforward expansions and optimizations that virtually everyone would need to use in practice.
Patents suffer from the problem of trivial extension. The typical thing I have noticed is that you can take a well known thing, not change it at all, and use it in a context it hasn't really been used before. Then patent that.
I can't patent Huffman Coding? What about Huffman Coding... in space?
You can't patent things that are in common usage. You have to be the inventor.
@@TreeLuvBurdpunope, as long as you can prove it’s a novel idea, there’s a possibility of getting a patent. It all depends on how good your lawyer is…
@@TreeLuvBurdpu tell that to the patent officer that approved some of the patents I have been directly or indirectly involved in.
In theory, completely agree with you. In my experience that is not the case in practice, and the concrete application is more important.
I haven't read the actual patent myself, but I found out one of the teams I worked with successfully got a patent for using CNNs for image classification in a certain medical context. They didn't invent using CNNs for image classification.
Part of the problem is that the standards of whether something can be patented or not is completely variable. In theory it has to be original and demonstrate an inventive step - something not obvious, something the inventor thought of or did.
So Microsoft taking Huffman encoding and simply applying it to a few common critical structures and patenting that should not pass the inventive step threshold, because it's an obvious application that anyone using Huffman encoding would do.
Patent law, especially in the software space, needs real reform somehow.
@@thisisabsolutelystup The huffman encoding in space example wasn't real. I was referencing the "recycled in space" trope. Maybe I am too chronically online.
If there were enough algorithms to do this with, I could imagine making a series of "Algorithms you can't use"
There totally is a lot of them in this situation, though usually there is good alternatives.
There is a whole book of these and even that just barely scratches the surface. Look up Math You Can't Use by Ben Klemens. It's over 2 decades old now so I'm sure we could easily write several new editions of it with newer patent BS like this one.
If I had to guess, there most likely would be plenty of algorithms you can't use
I would not recommend watching this video, it's very dangerous for a developer because it makes you liable if you end up using one of them
You can use in FOSS projects no?
This a neat video, and an interesting subject, however, it was a bit too fast paced for me. A few could be done differently :
- Sometimes, you list goals, but do not put those in written format, which makes things harder to follow.
- Having a synthetic representation of the algorithm as a whole could also help a lot, seeing the states of the variables evolve in real time is neat, but insufficient.
- Getting the meaning of abstract notations takes some time, using a more explicit naming convention could help.
I agree. I couldn't follow along as well as I'd have liked. There were so many symbols with single letter names and mentions of technical terms without accompanying visualizations. In an ideal world where the editor never gets tired and never has to take breaks, every variable in the equation must constantly have its definition somewhere on-screen. either as text or as an image explaining what it is.
But I do really appreciate the fact that the video exists at all. It's a lot more digestible than... papers.
It's actually being used in JPEG XL.
The problem is so few programs support it despite its backing from every major web browser, photography company, open source organisation, and cybersecurity group.
Cuts the size of PNGs in half in most cases while still retaining lossless quality.
Google pulled JPEG XL from Chromium in favor of their own webp which effectively killed it as a web format, Mozilla and WebKit (Safari) followed suit.
JPEG XL is not lossless
It can be, just depends on settings
@@darthjaffacake8573 I was wrong
@@keisisqrl Google killed JPEG XL in favor of AVIF, not WebP. WebP is no longer being developed. Mozilla did not kill off JXL either, it is still available as a nightly flag. And Apple has full support for JXL across all their OSs including WebKit.
6:13 Took me a second to figure out how clever this was.
Assuming you're playing a game of hangman in which the word to be guessed is an actual, English word, it could be plane, plant, plank, or plans. In that case, saying that the last letter is a vowel is worth two bits of information (as it divides the space of valid possibilities by 2^ *2* ). Saying that the last letter is a letter is worth zero bits, given assumptions.
There's actually an input method based on this, called Dasher. It's literally harder to misspell words in.
jarek duda: check out this really cool compression algorithm I made! it’s free and unpatented and twice as good as Huffman coding.
Microsoft: check out this really cool compression algorithm I totally didn’t steal from jarek duda! it’s proprietary and patented and twice as good as Huffman coding.
patent office: yeah, we don’t have it on file. Here’s your patent, Microsoft
jarek duda:
The "I made this" meme would have encoded the same information but vastly shorter
@@jb31842 Unfortunately it was patented by Microsoft.
I'm of the opinion that software shouldn't be patentable, but also that copyright should expire after no more than 28 years, although 14 would be preferable. Years ago, back when I was in middle school, I remember playing with a compression program written in QBasic which used arithmetic coding. I thought it was pretty cool, but at the time didn't know how to use it. That was significantly longer than 17 years ago now, and even if the idea were capable of being patented, it definitely should no longer be as 17 years is the limit for a patent. Of course, I also think that our patent system needs to be fixed in more ways than one, including that people with technical knowledge should be the ones to review patents in technical fields and the law, which gives us the criteria that they should be novel, should actually be applied and prevent patent squatting as MS, and many other corporations acting in bad faith, have done.
No, we need just remove patent laws. There will be inventinve AI that can solve "on flight". There is no reason to give right for ideas to the first organization that will run the search.
I can't tell you how far above my head this video is....but it approaches infinity.
You could probably understand it if it was something physical like sorting books and you did it as a job for about an hour straight
My data compression professor last semester made really good points about how Huffman still has a place. It’s just a substitution thing and you can simply read the encoded text if you have the scheme. It’s a lot more performant when it comes to decoding.
Yes, the best compression schemes have multiple algorithms, and choose the appropriate one. Nobody is saying to just throw Huffman away.
I love asymmetric compression schemes personally, but the wiki claims that fractal compression has largely been abandoned because it became clear that fractal compression would always be very asymmetric.
1:25 "hello, how are you? fine, thank you"
oh my gaah!
I know that from a meme someone sent and now wonder how relevant it is to the world
Hello everynyan
@@puppergump4117the anime was surprisingly influential in that it also coined the word "waifu" from one teacher saying "mai waifu".
im out of the loop, can someone fill me in?
Typical fucking Microsoft, trying to patent something in the public domain.
God how I hate that company.
@@koghslook up the Microsoft hate website. All tech corporations are dog shit evil.
@@koghs Microsoft is notorious for being patent-trolls towards FOSS
the problem is copyright/trademark/IP law, Microsoft is a symptom
you should blame the root cause if you want the problem to be fixed
It's usually said of journalists but, "you don't hate Microsoft enough" 😂
They used to do that a lot before. But now they try to compete more with online services.
So what I got from this video is that we need patent law reforms and also Microsoft needs to get its shit kicked in REALLY hard?
Better: software patent abolition. We don't have them in Europe and we don't want them.
@@DiThi Nice, but is there a way to _prevent_ an American company from patenting your idea?
@@DiThi Or failing that, to prevent said patent being enforced in your jurisdiction?
@@i-am-linja I thought prior art would be enough, and maybe it's still enough and the patent for ANS wouldn't really hold in court, who knows. Patents are not enforced in my jurisdiction, they can't. However we live in a global world and using US patented algorithms would prevent me from publishing my software in the US.
hom osuck
Thank you so much for sharing this! I was implementing LZMA earlier this year (for fun, and to learn how it works), and when I got to LZMA's range encoding component I realized how genius this approach is. It took me days to understand, but your video would've helped so much if I was around haha
We need another edition of Math You Can't Use by Ben Clemens just to cover this BS. Well, I mean we probably need a reconsideration of the patent regime first. Software - being just maths - should *not* be granted patents at all. If the whole idea behind patents was to foster innovation then "we've been sending completely unnecessary amounts of data over the internet because patent trolls" is definitely evidence that it doesn't work that way.
Most of the time software cannot be simply patented, however source code is covered by copywrite. They are not the same thing. There are a few exceptions but you need to dig into the specific cases to have any meaningful discussion of that area.
@@mytech6779 copyright doesn't matter here since we are talking about patents. Copyright also only applies to the specific implementation - or expression - of an idea. Therefore although there are problems with copyright as well it doesn't at least give someone a complete right over an idea.
Maths and algorithms are fundamentally different. Computers have more in common with logic than maths.
@@wb3904 logic is maths
@@Turalcar no maths is based on logic not the other way around
I remember you way back on the days of KA. Awesome to see what you're making now. Keep it up!!
1:23 Azumanga Daioh reference 「I wish I were a bird.」
I remember seeing a demo for fractal encoding instead of JPG in the early 1990's in the days where 14.4kbps modems were still very expensive so I only had 2400 baud dialup. The demo was a real-estate website with pictures of a houses. Interlaced jpg images took a few minutes to download and view, where as the new fractal format was
Even with today's internet speeds, I sometimes end up having to deal with ridiculously slow connections, ranging from minutes to hours. Genuinely unable to look at pictures sent via a specific service because it loads 10 rows of pixels per second. Compression like that should be a priority for all online services, and there is no excuse for CPU load being high, considering how stupidly fast they are now.
Oh well, how else are they gonna get us to pay more for "faster" internet?
This is way above my paygrade. I need more readings before I watch this video.
I recently used an adaptive Arithmetic encoder and a LSTM for a custom compression algo, the arithmetic encoder was cool to work with
I appreciate the many funny example words you used. Very epic video! Awesome visuals and interesting explanation!
It was a pretty serious video. None of the examples are funny.
ayaya
To be honest this should've been an hour long video, it really went over my head
This video is like the perfect counterpart to my #SoME entry :D
Huffman Trees are the one part I left out of my vid, this is awesome
Entropy coding explained in 5 mins. Must be a new record 🎉
love me a good encoding algorithm breakdown 🔥
animations lookin snazzier
This is an excellent explanation no doubt, but I must be honest I find it very hard to follow and understand
It is only a good explanation if you are familiar with the jargon, in other words, it is a bad explanation.
It's basically an academic paper in video form. I found the information to be well laid out and easy to follow.
If its hard to follow and understand, its not a good explanation
3 minutes into the video and i knew.....my brain is smooth, very smooth
Saw some comments complaining about the music, and just wanted to say that I disagree completely. Having some music is nice, and the choices you made are very low energy and out of the way anyways.
Yeah i can see your point.Maybe he could sidechain the music so when he speaks the music dives in volume more.
I think its fine as well, though it would be nice to have an option on yourube to turn off the bg music if you have your own prefered soundtrack, though the video would need to made withouth merging the soundtracks making the file a bit bigger😅
I dont think that should be a huge isue🤔
Slight gripe, at 9:00 it was a bit confusing that you described the operations as "outputting a bit representing which half we're in" and "centering and renormalizing" instead of just saying we want to find a number within the range, that halving is just how decimal binary works, and briefly pointing out why the "centering and adding 0s" works.
Otherwise interesting video, and cool subject, thanks!
Yeah, he lost me at that part, too.
I can see the intention of the video
But what he doesn't realize is that thinking in whole probability is not functional, that's not how machine works and it's ultimately extremely inefficiency.
It's a classic "looks good on paper" but in practice you are trying to force something that doesn't fit
Took me some time wonderi'g what he was getting at but it works I guess
I understood at the end of his explanation that it was just writing the number in base 2
@@k1ll3trs86I don't understand what you're trying to say. This is not only good on paper but good in practice too. It is literally strictly better than huffman and huffman is used everywhere.
@@anonymous-q2b5s I think what's meant is that keeping the whole real number in memory and then writing it out in binary is a bad idea (because doing math on long real numbers is a bad idea).
This was a heroic effort at an explanation video, but it should really have been a 10 video series at least.
There is a FUSE file system called PiFS. It can store data using some sort of arithmetic coding where the resulting number itself gets compressed as an offset + length into the unlimited digits of Pi.
Relying on the assumption that pi is a normal number... which is probably true but not proven!
@@vurpo7080That would be one hell of a bug to run into some day.
@@vurpo7080 even if it's not, you can just split the data you want to add, might even act as a kind of data de-duplication
i would genuinely like to see someone make a semi-serious version of that
What a wonderful presentation and justification of ANS you have posted. I loved watching it, although speech is a bit faster, I used .75 speed 😊. I would reccomend you create more videos explaining other compression algorithms too, which in fact have arithmetic roots. Highly recommend and highly appreciated dear !
Onw video I would want can be on different methods of approximation, and second on almost all, mostly forgotten bitcodes generation methods in 1 go. Also error correction, hamming codes, reed Soliman codes too , etc.
Looking forward to more content ❤
I stumbled upon ANS for the first time in this video, so thanks, but I didn't understanding it at all. I had to watch Stanford's EE 274 on the subject, where it seemed ... approachable, at least. Not that a video of this length was going to fully explain such fairly advanced topics, the pace is quite fast, but it was an excellent introduction to the subject, and it motivated me for further research, so goal achieved I guess. Also the effort in making it really shows.
stuff like this is why I'm glad I took AP stats so I can come back and understand what I'm listening to later on this year
I got a 5 on AP stats and I'm pretty lost lol. This video is less stats heavy, at least the stuff you learn in AP, and more computer science jargon heavy. It also just moves uncomfortably fast for the stuff I have some handle on.
I am usually not in the "wow this is so cool but I didn't understand anything" group, but I think I might be with this vid. The video is paced pretty fast, if I wanted I would probably rewatch it. Currently I just feel like falling into a rabbit hole (which happens to me quite a bit) but not keeping up with the falling.
Haven’t watched yet but there’s a neat thing: arithmetic coding, or a closely related thing, allows one to convert a sequence of fair N-sided coin outcomes to any other discrete uniform distribution without rejection, though if you don’t want to reject you’ll need unbounded memory, and if you want to flush time to time, you’re effectively rejecting some information you’re getting from the initial sequence. But at least you can fine-tune how much you want to reject/save and change it over time!
Maybe it’s already in the video but if it’s not, here. Also it’s as if there’s some duality about entropy generated by rejection, non-uniformness introduced if we decide to reject only a given number of times and then be complacent, working memory etc.. I don’t know in what kind of inequality this can be written but I hope information/coding theorists have it already discovered and generalized similarly to what happened to the uncertainty principle (in QM and Fourier analysis-now there exists a general operator algebra approach IIRC).
For the folks interested in functional programming or calculating correct programs, Richard Bird's "Pearls of Functional Algorithm Design" has two chapters on arithmetic coding, one on the rational number variant and one on the integer variant.
Arithmetic coding is widely used. It was the only option for some standards, even before the patents expired. So, it got used, patents or not. I've had to implement it for things like JBIG coding for FAX, where its specifically there to improve on the huffman coding in the previous compression methods used for FAX.
Nice to see someone who appreciates the second Arabesque!
I wonder why Apple rolled out HEIC instead of just using arithmetic compression, considering it would already be incompatible with standard JPG?
There is a technical and a business reason.
The technical is that HEIC - like the rival AVIF - goes far beyond just arithmetic compression. It's really an adaptation of techniques developed for video compression, and will wipe the floor with JPEG, even arithmetic-compressed JPEG.
The business reason is that HEIC, being based on the h265 video codec, is a heavily patented format - and one of the contributors to that patent pool happens to be Apple. Not only do they get to use it at a reduced licencing cost, but whenever another company wants to use HEIC, Apple gets a cut of the royalties. It's clear that JPEG is going to decline, and there are a number of formats competing to replace it - it's in Apple's best interests to make sure that the format they stand to profit from is the one to achieve dominance, and making HEIC the default format for taking photos on iPhones and iPads gives it a huge advantage - every photo-sharing service needs to at least support uploading HEIC, or else Apple users will find their photos won't share.
There's a rival format, AVIF, which is supposed to be patent-free (you can never be sure) and pushed by a rival consortium, with Google as the major backer. They aren't trying to score any patent money, but that doesn't mean it's entirely altruistic: Their aim is to promote an open and unpatented format in order to avoid having to pay up royalties to the likes of Apple.
How 8:45 do you store all messages and allow easy lookup??? There are a few REALLY one excellent, and many adaptive ones, approach here 8:45
I'll be honest it was hard to follow, maybe I'm just dumb 😅
The music was fine imo, but if people complain you can "duck" it (the volume lowers on speech).
no you're not dumb. This video seemed to be made for an audience that already knows compression algorithms and information theory at a competent level.
@@jeffreychandler8418 Not even, I'm pretty familiar with many of the concepts but he's just running through the ideas with no space to breathe or re-emphasis on the critical aspects.
@@ArnaldurBjarnason That's validating to hear xD I usually ascribe misunderstandings as a personal issue than a communicator issue (unless it's REALLY bad like the Zion NPS instagram or something like that).
As I think about this video again, honestly a two minute succint summary of existing algos would be great, then trim the fat on the new algos to make those simpler, and allow more breathing space.
I actually used both algorithms in my lossless video codec - ScreenPressor. Initially it was range coding (arithmetic coding with integers) but it involved division which isn't very fast. Then later version used rANS which is faster but requires compressing data in reverse order. The end result was pretty great: best in class compression and good speed (when the content is not too complicated, as in typical screen data).
I understood about 25% of the video on my first try!
I'm stuck on the variable length encoding in your explanation. "Shorter codes encode more frequent symbols". How do you indicate the length of bits that should be decoded into a single symbol? Presumably, short codes would exist inside longer encodings, for instance "101” is inside " 1011”, and "10” is inside ”101”.
In the Huffman algorithm, the codes used are designed as such that no shorter code is inside a longer code.
So in the example you've provided, 1011 would mean you have the symbol "101" and then you have another symbol starting with "1". The way huffman is usually taught in school, this involves following a tree, where 1 is right and 0 is left, and the decoder traverses that tree with each bit you read until you eventually hit a symbol, output that symbol, and then go back to the top of the tree.
with that explanation, the symbol encoded for 101 would mean you went right, left, right in the tree, hit a symbol, and went back to the top.
Hopefully this explanation is easy to understand, it would be better with diagrams but those are hard to make in youtube comments.
@@inanefool8781 You can pause the video at 1:48 to see this effect; all the shorter codes are not contained within the longer ones.
@@inanefool8781 that is actually a very good explanation that is ready to understand. Thanks. Hopefully i can remember it this time.
By the way a coding algorithm where this property holds (no word is prefix of another one) is called a prefix code.
Iirc from reading CLRS, a non-prefix code can always be transformed into a prefix-code, so usually textbooks focus on prefix-code.
Though I have very few knowledge about codes so I might be wrong.
Imagine leaving your hard work public for the sake of humanity and greedy corporation takes credit for it and ends up forgetting it in their patent drawer
You can use the JUSDTB compression (JUSDTB stands for Just Use Smaller Data Types Bro)
This doesn't actually exist, it's a joke.
this is buhdj
bro
ur
hillarious
dad
jokes
I'm pointing at YOU, database users! You should know the data types and use them accordingly.
You don't need a string to store one of 3 states, use an enum.
You don't need an int for a number between 0 and 100, use a byte. It's 4x smaller.
You should always use unsigned if you don't need negatives. It doubles the positive range.
Also use decimal, it can store decimal places more precisely than float. If you don't have a horrendously huge range, just use it.
I prefer πFS personally
@@dadamaldasstuff1816wdym? A string can be two bytes of it holds just one char (and the null char appended). An enum is in most languages an integer, which can be 32 or 64 bytes and use up more space. I'm sure the db stores it at one byte though
I still agree that you should always use the right type though. Even helps you catch bugs early when you get screamed at for having the wrong type
It would be nice if the video with a title about halving image file sizes actually discussed halving image file sizes. Can you explain why this is specifically good for image compression? Are there any actual implementations of arithmetic encoded images and how do they compare to huffman encoded ones, in terms of filesize? How did you arrive at the 50% improvement value? JPEG-XL, which is the only format you mentioned that kinda used the ideas presented in this video, only has a guaranteed file savings of about 20% when using the JPEG Re-encoding feature on an image that was already a JPEG. How does this raw data stream compression algorithm compare to the image-specific compression algorithms used in various formats?
Quite interesting subject, I didn't treat probability as something related to my interest field, until now. A year ago I started writing my own lossy audio codec, which is done for now, but I didn't implement any kind of entropy coding. I tried Huffman, RLE, but only thing I got was bigger file size and increased complexity. The problem with these algorithms is that there must be a defined set of symbols and a strong pattern in encoded message, otherwise it won't work. Using raw binary sequences as symbols doesn't really work, since there is quite a lot of them and probabilities for each other sorted in decreasing order are far away from shape of 1/x function. But the only pattern I saw were groups of 0's with less 1's in between. I still can't understand how MP3 pre-processes values so that they are suitable for entropy coding. By saying values I mean quantized frequency coefficients after transform from time to frequency domain, MDCT in my case. After psychoacoustic analyzis I know which coefficients can be discarded and what precision to use for the rest of them. Coefficients are divided into subbands with fixed size, let's say 16, and then they are quantized with different precisions. Single frame can have subbands quantized with both 2 bits and 16 bits, although now I know that 16 bits was way too much even for almost lossless compression. Each subband has its scalefactor field and a precison field, saying how many bits are used per coefficient and what scalefactor was used to divide values to quantize them. Quantized values can be basically anything, I didn't see any strong pattern there. From what I heard, MP3 uses more or less fixed precisions for coefficients, and maybe that helps to entropy code them. But in my codec everything is dynamic and bitrate can vary from for example 20 kbps to 400 kbps for the same file and same quality setting, 20 kbps would be used to encode a single tone, while 400 kbps would be needed for a drum hit. Even quantized values are not fixed-length, for example with 3-bit precision, 0 would be just 0 in binary, 1 would be 100, 2 is 101, -1 is 110, -2 is 111. Scalefactors and precison fields also have variable lengths and are differential-coded. There are also magic values for those fields which can specify number of empty subbands, so few tens of empty coefficients can be stored on a few bits. And these aren't all clever mechanisms that I used instead of entropy coding. But I'd like to understand how to implement this coding for audio / how to prepare values for that coding. With that and some other changes I could create even better codec, which already outperforms MP3 a bit. It's called DAC (Dynamic Audio Codec), and I have a video about it. Unfortunately in polish, but today it shouldn't be a problem. I also shared the working codec (browser it all you need to run it), so take a look if you want. Anyways, thanks for the video and I'm waiting for the next one!
Oh, and there is a link to a video about DAC. YT sometimes removes such comments, so I'll share just the video ID: Qy_AinciI9Y
11:56 took me [I stopped counting at 6] times listening before I could hear "extra digit".
Thank you this helps me to understand arithmetic encoding and ANS.
Both (+b Huffmann) assume the symbols are independant. Could Markov chains improve compression rates, or techniques based on repeated occurrences makes Markov chains irrelevant ?
i dont understand anything :(
Same, i have no idea how to code this at all, or if it even makes sense😔
@@gnt10 i even master on some coding skill but still not understand :D
I found the comment where I belong. 😊
😁
Great manim animations. It’s a shame ANS isn’t as widely used. There are similar patent issues with lossy compression algorithms as well, but those are even more complex since there are two vectors of optimization (compression and reconstruction accuracy) where the second does not have a concrete way to measure it.
There are some synthetic benchmarks, like 7zip used to show programs compressed with zip vs 7zip. A standardised benchmark may be useful, though compressors could overfit for that benchmark and look better than they really are
@@Aera223 well zip and 7zip are still lossless, but as for lossy a benchmark doesn’t exist for that reason. Some of the reconstruction quality will be subjective. PSNR vs Compression Rate is the most common metric but PSNR can be high while having a very noisy output, or a singular spike of error, as long as it averages out. Some situations lossy compression will have rules for the application such as not deviating more than X amount, or always producing smoothed results
AMR-WB+ is the same place for audio encoding/decoding. It's a very specific codec used in talking books that had its patent expire in 2020
THIS VIDEO IS SOOO GOOOOD!
When i implemented arithmetic coding a few years back, i had no idea of the fraught patent history. Nice bit of info.
This channel is a hidden gem :O
Really well done video. Thank you!
That was quite interesting!
Also does anyone knows the song at 10:35?
What about AI? It predicts the probability of the next tokens, why not the Hofphman thingy given those priors
The probability prediction and the encoding method are two separate steps, and the video just focused on the latter and not the former.
In most of the video, the probability is fixed (or at least known) and independent, so there's no need to predict anything.
In benchmarks using real world data, AI is indeed used for the probability prediction; for example, the top 23 (highest compression ratio, lowest file size) compressors of the enwik9 dataset all used some sort of AI and the current best (nncp v3.2) used a Transformer model.
In real life applications though, AI is too slow compared to simple, non-AI predictors, so the smaller file size is hardly worth the extra time and power cost. While nncp v3.2 can compress enwiki9 to 49% of that of zstd (zstd 0.6.0 level 22 ultra), it takes 9.6x as much memory and 683x as much time to compress and then decompress the dataset.
0:28 While of course it is stored as a number, any piece of data on a computer is a series of 0 or 1 bits, there are not themselves numbers, but any series of them can also be represented by a number.
3:10 Why is there a factor of 1/N if it approaches infinity? Won’t that just lead to 0
1/n approaches 0 as n goes to infinity but the sum of the series of 1/n diverges as n goes to infinity.
1/N represents the dividing by N involved when calculating the average(mean)
I don't really know what all the terms/notation are in that equation so I can't really analyze it properly, but probably the sum is an infinite value, so it becomes a 0*infinity indeterminate form.
The summation scales up linearly as N increases so it'll just diverge to infinity. If your message is twice as long, on average you'll need twice as many bits to encode it. So, the 1/N is there to give us a more useful number to measure coding efficiency.
I realize I came up with the arithmetic numbering system algorithm when I was 14 while creating a chess compressor. Literally that algorithm.
I hope some google/amazon data centre engineers are taking notes, if they can hear the video over the water pumps out there in the Nevada desert.
I first heared of this about 30 years ago and i was so excited that i wrote it in C.
I can acknowledge, it's compressing better than huffmann, but twice? Maybe with very specific data.
But it was quite slow. Most probably because i wrote the adaptive version.
It should be written in assembly, because it has some very special needs that no higher programming language provides. And writing a workaround to get that missing feature in C is a pain in the afternoon and also slow.
Maybe using libgnump will be helpfull, if you don't want to use assembly - libgnump is written in assembly, as far as i know.
A downside with a lot of these is that they are slow: Arithmetic coding = Very slow, Bitwise Range Coding = Rather slow (but faster than traditional arithmetic coding), LZMA / 7zip uses this. Huffman can be made semi-fast (and a little faster if one limits symbols to 12 or 13 bits vs 15 or 16; with a lower limit the decoder's lookup table can fit into the L1 cache).
Rice coding can also be made fast (more so when a similar length-limited is applied), but only works well with particular probability distributions; but can be turned into an adaptive scheme with relatively little added cost. One trick I have used is to combine Adaptive Rice with a "swap towards front" table (each symbol is encoded as an index into a table, and when encoded, the symbol at index J is swapped with the symbol at index (J*15)/16, which is used the next time this symbol is encoded). This scheme is adaptive, can be almost as fast as Huffman, and almost as good (in terms of compression) while needing less memory footprint (sometimes useful) and less code (also sometimes useful), and (unlike Huffman) the decoder lookup tables can be made read-only. But, a lot depends on what one needs...
thank you great insight - good explanation - there were some wow moments - thanks
This is an interesting topic and discussion, but the presentation and explanations were really hard to follow. The music was *mostly* unobtrusive, but there were a couple points where it really came into the foreground due to the tonal content and became distracting
It is important to understand that the Huffman Encoding requires that it build a "dictionary" (Huffman Tree) with the encoding and that must be included in the file so it increases the size of the compressed file. Additionally, the "encoding" used by Huffman is "Prefix-Free", meaning that no symbolic encoding can be a prefix for any other encoding used in the compression dictionary. It might have been one of the reasons why people were having trouble following. Maybe add the tree and prefix free quality in your next revision.
Not finished the video, but are they as efficient at decoding? I realise theres a lot of overheads on modern devices, but decoding and rendering images happens so often in a normal user session, it does make some sense to trade minimal disk space in the large picture for faster decoding. Just an idea on another reason they may not have moved on either?
Even if its slower to decode, just like zip files take a while to open, it would be worth doing it. Transferring compressed data through a network is worth the cost of computation and time it takes to encode/decode at the sender or receivers end, because transferring uncompressed data takes even more time and money than that
@@arxalier2956 not all images get transferred through networks though and webmasters have to make tradeoffs between optimization on their end and load speed on the users end. You lose something like 50% of your visits if the page still is loading after 3-5 seconds
ANS is super fast at decoding. Especially tANS.
@@uis246 faster than Huffman? If so, interesting. Was just an idea that came to mind.
@@lainwired3946 oh. I thought you was asking ANS relative to arithmetic coding. Derp. Arithmetic coding is slower than Huffman and ANS I think is faster than Huffman.
I recommend anybody using this method to also implement try to implement more layers, compressing with algorithms and compressing neighbour pattern delimiters with some form of character.
This will greatly improve the compression and complexity of your data!
literally anything can be interpreted as "a single number"..
that's pretty much the point of the video yea
@@ZeroPlayerGame Find the probability of everything, and interpret "1" as the most probable thing, "2" as the second most probable thing, etc. Voila, optimal compression! there may be some slight implementation details to be worked out...
Seriously though, one thing I found out many years ago which stuck with me, is that most compression algorithms don't use the whole number line this way. In fact, they only use a tiny fraction of the numbers on the number line. Case in point: A random file is practically never a valid zip file, not even if you strip away headers and other format bookkeeping stuff.
Even arithmetic encoding and ANS (as described here) leave numbers on the table, so to speak. Mostly due to end handling. Defining a bijective compression scheme (where every sequence of bits decodes to just one message, and all messages can be encoded) is fiddly and most compression people don't see the point. But I think such algorithms are beautiful.
@@Mnnvint what's "2"? What's your alphabet? Huffman is bijective. Arithmetic coding is bijective if you encode a whole number of bits (might not be the case). The number line is a mathematical abstraction and doesn't exist in the physical world where encodings have to live.
@@ZeroPlayerGame Find the probability of the most likely message, and assign it to "1" on the number line. Find the second most likely message, and assign it to "2". There is no alphabet, we deal with the whole message at once.
"Encodings" are a mathematical abstraction too, so they certainly don't have to live in the physical world if numbers don't :) Personally I like mapping things to the natural numbers/positive integers.
I specifically said arithmetic encoding is not bijective *as described in this video*. There exist bijective arithmetic encoding implementations. There even exist bijective variants of Huffman, though that should probably be considered a variant and not just a different implementation.
Bijective compression has a special meaning. It doesn't just mean that there's a bijection between messages and encodings, that's a much lower bar (although most compression methods don't meet that either). Bijective compression means that in addition to satisfying this weaker constraint, every sequence of bytes is a valid file in the compression format. There can be no "unexpected end of file" or other format errors, any sequence of bytes has to decode to a unique sequence of bytes.
what about the fidelity of the image ?
I wonder if P-adic numbers could help improve these algorithms
Erm, pretty sure I’m watching the video encoded with arithmetic coding. It’s the basis of h.265 and h.264 video codecs, s as long with their still image profiles.
H.264 baseline profile is closer to Huffman, but the main profiles expect to use context adaptive binary arithmetic coding.
Great animations. I understood nothing about the proof but the examples are nice. Is there a specific software you are using for this presentation?
Nice video! Just one suggestion: don't let the video black-out for so long while you're talking. I was confused thinking something was wrong with the image, because you were still talking regardless hahaha
is there any legal risk to bake the ideas pointed at in this video into a commercial product ?
Interesting topic, good explanation, nice animations, and a special thanks to Frédéric Chopin.
TH-cam, is as bad as the US patent office, and doesn't recognise all authors of the music. YT didn't recognise Satie either.
blipblipblirip, you're jobs taken😂
Middle out?
Do we have a new contender for TechCrunch Disrupt??
13:13 I'm not a big fan of the probability part of this but I suppose it is likely a good thing overall seeing as it is beneficial to your average image or text. Unless the entire thousand letter long text document is just repeated "As" you'll have no problem.
What about markov compression? 😮
I got lost in the first 10 minutes haha, all the math killed me
Some day I will grok how you output a Huffman file with the weights included, or possibly how they are inferred from an adaptive version - and not taking up masses of space, relatively, for small files.
1:12 azumanga daioh reference😺
Revelation 3:20
Behold, I stand at the door, and knock: if any man hear my voice, and open the door, I will come in to him, and will sup with him, and he with me.
HEY THERE 🤗 JESUS IS CALLING YOU TODAY. Turn away from your sins, confess, forsake them and live the victorious life. God bless.
Revelation 22:12-14
And, behold, I come quickly; and my reward is with me, to give every man according as his work shall be.
I am Alpha and Omega, the beginning and the end, the first and the last.
Blessed are they that do his commandments, that they may have right to the tree of life, and may enter in through the gates into the city.
Oh my Gah!
07:41 but now 0."10" and 0."100" are duplicates
I suggest it for the research, to use nibbles 0..9A-F instead of decimal digits or binary digits (bytes as digits are too symbols).
"this is one of the way your jpegs [...] compress data without losing information"
jpeg: lossless compression
jpeg does lose information, but it was optimized to not lose too much *meaningful* perceptible information (that is until you zoom in)
JPEG uses both lossy and lossless compression. The lossy compression first, then lossless compression of that as a last step. Obviously with the final result being lossy.
The lossless last step is a variant of Huffman.
So to say that JPEG compresses data without losing information is a half truth. Badly worded on his part, I'd say, without clarifying that it uses both and the final outcome is indeed lossy.
Huffman also accepts that "01" can be multiple characters like "you"?
Also H(X) is sometimes called Shannon's measurement of Information (SMI)
“i reject your rejection”
I thought that x.264 uses arithmetic compression.
I almost understood this but there was a damn plane kept flying over my head
I think you'd make a great youtuber, depending on how long it took you to make this. If you keep the style and make some in-depth videos on things where online resources suck donkey like shaders or common library implementations you'd probably hit 100k before the year ends.
What if someone discovers these patented algorithms? Can they still be sued ? Are they legally allowed to use it?
You made Jarek happy. He looks pleased in JXL discord server.
Before 11:11, I was like great, this is cool but doesn't seem that helpful
After that simple change, everything makes so much sense
Just do a binary search on the list of all words? brilliant!
This channel is boutta blow up