What is the difference between tar and zip?

tar in itself just bundles files together (the result is called a tarball), while zip applies compression as well.

Usually you use gzip along with tar to compress the resulting tarball, thus achieving similar results as with zip.

For reasonably large archives there are important differences though. A zip archive is a collection of compressed files. A gzipped tar is a compressed collection (of uncompressed files). Thus a zip archive is a randomly accessible list of concatenated compressed items, and a .tar.gz is an archive that must be fully expanded before the catalog is accessible.

  • The caveat of a zip is that you don’t get compression across files (because each file is compressed independent of the others in the archive, the compression cannot take advantage of similarities among the contents of different files); the advantage is that you can access any of the files contained within by looking at only a specific (target file dependent) section of the archive (as the “catalog” of the collection is separate from the collection itself).
  • The caveat of a .tar.gz is that you must decompress the whole archive to access files contained therein (as the files are within the tarball); the advantage is that the compression can take advantage of similarities among the files (as it compresses the whole tarball).

Leave a Comment