Zip vs Tar.gz Files Explained and Compared (Archiving and the DEFLATE algorithm)

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ก.ค. 2024
  • In this video, I explain the similarities and differences between two popular compressed archive formats, zip on Windows, and tar.gz in the *nix world. Both formats typically use the same compression, and also serve as a way to collect files together in an archive, however, there are a some fundamental differences between how they work and why they're used for different purposes.
    I discuss the two main steps of compression and linearizing the files into an archive, and how each stage differs between the two formats, as well as the advantages and disadvantages of each implementation. I hope you enjoy the video and learn something new!
    This is yet another video that is a bit rambly at times (having cut it down from over 30 minutes). I have added timestamps so you can easily skip between the sections you're interested in. I apologize in advance for the clipped (and heavily compressed) audio, this was recorded accidentally with a 10 dB boost on the mic, on a 128K MP3 recorder making for a questionable combination.
    Timestamps
    Introduction
    00:00 - Introduction
    00:19 - What is a zip and tar file?
    01:45 - Gzip (and other compressors)
    02:45 - Why use tar instead of zip on Linux?
    What's the difference?
    03:48 - How does tar work?
    04:39 - How does zip work?
    Comparison
    05:09 - The advantages of zip
    06:46 - The advantage of tar.gz (and streaming compression)
    08:30 - The disadvantage of tar
    Further learning
    09:30 - Notes on 7zip and .gz.tar
    10:40 - Indexed tar files with pixz (and comparison to 7z)
    13:30 - What should you use?
    15:10 - Conclusion
    Links (Get Smarter Section)
    DEFLATE algorithm:
    Wiki: en.wikipedia.org/wiki/Deflate
    How DEFLATE works (good summary): zlib.net/feldspar.html
    Full specification: datatracker.ietf.org/doc/html...
    TAR format:
    Wiki: en.wikipedia.org/wiki/Tar_(co...)
    Man page: linux.die.net/man/1/tar
    Tar format specs: www.gnu.org/software/tar/manu...
    Gzip: (based on DEFLATE)
    Wiki: en.wikipedia.org/wiki/Gzip
    Homepage: www.gnu.org/software/gzip/
    Bzip2:
    Wiki: en.wikipedia.org/wiki/Bzip2
    Homepage: www.sourceware.org/bzip2/
    XZ utils: (LZMA2 compression)
    Wiki: en.wikipedia.org/wiki/XZ_Utils
    Homepage: tukaani.org/xz/
    pixz: (parallel indexed xz)
    github.com/vasi/pixz
    pigz: (parallel implementation of gz)
    github.com/madler/pigz
    Lzip: (also based on LZMA2)
    Homepage: www.nongnu.org/lzip/
    LZMA2 Compression:
    Wiki: en.wikipedia.org/wiki/Lempel%...
    Z-Standard Compression: (aka zstd)
    Wiki: en.wikipedia.org/wiki/Zstd
    Homepage: facebook.github.io/zstd/
    Source: github.com/facebook/zstd
    7zip: (also generally LZMA)
    Wiki: en.wikipedia.org/wiki/7-Zip
    Homepage: www.7-zip.org/
    Source code: sourceforge.net/projects/seve...
    p7zip (POSIX port): p7zip.sourceforge.net/
    Zip: (generally DEFLATE)
    Wiki: en.wikipedia.org/wiki/ZIP_(fi...)
    Specs: pkware.cachefly.net/webdocs/c...
    Dar: (competing new format for tar)
    dar.linux.free.fr/
    Content used:
    Zip and Tar icons in thumbnail from FlatIcon.
    Ending music is We'll Meet Again by TheFatRat
    Clarifications and Corrections
    Just to clarify a few things before I get some comments: The 'only decompress the file' benefit I mentioned in zip is because zip (and 7z) keep an index at the front. If you did .gz.tar, you wouldn't get that benefit, as tar isn't indexed. Next, when I say 'the index of the tar file is at the end', what I mean is that if you want the file list (like an index would produce), you need to read the archive to the end as though there is an index there (tar files don't have an index, just a few bytes at the front of each file). So, to get a file list, you need to read those bytes at the start of each file, meaning you have to read the full archive. I hope this clarifies it.
    Clarification: pigz is not indexed, pixz is. Pixz is backwards compatible with xz, although both support multithreading these days.
    (more to be added)
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 27

  • @eduardmart1237
    @eduardmart1237 2 ปีที่แล้ว +10

    You make really interesting videos! Especially because they cover not very popular, but really important topics about linux.

  • @pavelperina7629
    @pavelperina7629 2 ปีที่แล้ว +3

    Minor stuff:
    Tar files have directory entry in a blocks aligned to 512B and 512B long ahead of each file. Basically one sector on old discs. Files are then padded by zeros.
    Zip files have directory entry in a block in front of each file with much less attributes (but more than gzip I believe), so it can be written as a stream and then each directory entry is repeated in the end file. Offset to the central directory is stored at the very end of file.
    IMHO choices are:
    zip for maximum compatibility,
    tar.gz for compatibility within Linux bubble and archiving files including user rights (which is pointless except backups)
    7z for maximum compression if it's worth the time and relatively good compatibility within IT bubble
    lz4 for maximum speed on large sparse data for internal use
    zstd as a tradeof of good speed and compression for general purpose (beats ZIP's inflate/deflate almost every time in both (de)compression speed and ratio), but for internal use as it's not widespread and it's not archive format so it needs container such as 7zip, but 7zip itself has to be patched to support it

  • @sknfer
    @sknfer ปีที่แล้ว +1

    I was going to sleep then this video popped up , great explanation of various topics, u deserve more subs

  • @aces8481
    @aces8481 ปีที่แล้ว +2

    very clear explanation you are a prodigious talent my friend

  • @vukanoa
    @vukanoa ปีที่แล้ว

    This was very well explained. Thank you.

  • @szymonpiechutowski2340
    @szymonpiechutowski2340 2 หลายเดือนก่อน

    Thanks for a very useful video!

  • @borsasostorangunt
    @borsasostorangunt 2 ปีที่แล้ว

    Awesome video, keep making more!

  • @randomdamian
    @randomdamian 3 หลายเดือนก่อน

    Awesome video!

  • @tulsatrash
    @tulsatrash ปีที่แล้ว

    Woo!
    Thank you for making this.

  • @Rainy32434
    @Rainy32434 2 ปีที่แล้ว

    Great video, thanks!

  • @prashanthkumar0
    @prashanthkumar0 5 วันที่ผ่านมา

    5:50 technically zip files store metadata at the end of archive also known as EOCD.
    it makes it easier to add new files. as it just need to append those and rewrite metadata at end of file.

  • @I_good_at_alaphabet
    @I_good_at_alaphabet หลายเดือนก่อน

    Thank you for this

  • @OscarCedano
    @OscarCedano หลายเดือนก่อน

    Good Video!

  • @sharlove3508
    @sharlove3508 ปีที่แล้ว

    wonderful explanation, ty😎

  • @sunnyyoda
    @sunnyyoda 6 หลายเดือนก่อน

    Nice ❤

  • @ArmandoCalderon
    @ArmandoCalderon ปีที่แล้ว

    great explanation.

  • @1aminepro
    @1aminepro 2 ปีที่แล้ว

    new subscriber here, love your content, if only you put that mic down

  • @ConorFenlon
    @ConorFenlon 2 ปีที่แล้ว +1

    Would it be possible to convert all files you want to compress to plain text files prior to compression? If the DEFLATE alg works better on text files, that would seem like a good idea, no? Is it more efficient to convert an mp4 to text, then compress, than just compressing the mp4 directly? 🤔 So many questions! 😂 Thanks for the explanations Tony. Keep up the great work! 😁👍🏻

    • @NielsGx
      @NielsGx ปีที่แล้ว

      bruh what
      mp4 is mp4, you can't "translate" it to txt, whatever this even means.
      when saying txt compress better, he's talking about compressing text that have 26 symbols from the alphabet, and have been designed ti compress well language, and not really for random stuff, because y'know, we use languages lmao

    • @ConorFenlon
      @ConorFenlon ปีที่แล้ว +1

      @@NielsGx Yes, you're absolutely right. We use languages. Like Machine Code, Binary Coded Decimal, Binary, Assembly Code, Hexadecimal. The list goes on and on and on. You can represent an mp4 video (or any other file type) in whatever type of encoding you want. Then we transmit that data using algorithms like BPSK and QPSK using beams of light to shoot the data down massive undea-sea cables from continent to continent. Literally anything is possible. Even the words you're reading from this comment right now have been transmitted by strings of 1s and 0s to explain this to you. But of course, what do I know? I've only been studing Electronic Engineering and Computer Science since before you lost all your milk teeth.

    • @mgord9518
      @mgord9518 2 หลายเดือนก่อน +1

      It's possible but there's no advantage. DEFLATE compresses text better than binary because natural text typically has less entropy.
      When you convert binary to text (using hex, base64, base91 etc) you cannot magically remove that entropy, so you get seemingly random text that's bigger than the original data

  • @ilhammega
    @ilhammega 9 หลายเดือนก่อน

    Accidantly i get to watch this video. I need tuttorial to convert backup Whatsapp acc in tar.gz to txt. Can you give me tuttorial?

  • @Bladedomainandhosting
    @Bladedomainandhosting 2 ปีที่แล้ว

    tar tar tvf setuptools-58.0.2.tar.lz find the file you want
    tar xvf setuptools-58.0.2.tar.lz setuptools-58.0.2/tools/finalize.py
    no need for it to extract it all :)