The UnzipMe page, for working with compressed files.

Paragraphs in this page are:

Q: What's the purpose for compressing files?
Now Let's get to practice!
Finalizing



Q: What's the purpose for compressing files?


A: For one reason that's many: Storage simplicity.

Storage simplicity is achieved by default when we compress many files - or directories containing files - to just a compressed one.

Space is saved by definition: we compress, right?

These reasons generate practical advantages of having data compressed.

Packaging reasons: When we travel a compressed file from or to a destination, we have to worry only about one file: the compressed one, and not about a horde of files that we would have to download and place corectly in the filesystem tree again.

Bandwidth reasons: The compressed (smaller) file takes less to travel between destinations.

But the packaging reason is so strong that we may often decide to compress files even if the output file is larger than the individual ones!

This is common when compressing files that are individually compressed.

There are two main ways for compressing a file:
With loss and without loss.

Without loss (lossless method): During the compression of a file, a program that operates certain algorithm methods, parses the data within one or many files, finds similarities, keeps the similar data once, and "remembers" the way to invert the procedure.

With Loss: When keeping the file Just like the original is not the main goal, certain methods take place to dicrease the quality in a spacified rate, which is very useful when keeping a quality of 70% saves creates a file 20% the size of the original one. This is just rate advantage.

Lossless method common use:
Binary and Text files.
Therefore:
  Documents,Spreadsheets, Databases
  Text files
  Executables
  Libraries
  Projects in source code.
Loss Method common use:
Media files in general: pictures sound and video files.

Broadcast technology, where in the case of a live event, it is obvious that the "file" is considered not having a beginning or end, and it is just data being received.

Keep in mind that the "endless" file is not a modern invention.
It's roots fall back in Unix!
Q: I Don't believe you! Give me an example!

Ok. When any shell initiates, it siply reads data from the standard input which is handled like a file, as everything in *nix.

When you press ctrl d in a void line, this is like the EndOfFile condition that would terminate any reading in any file operation program.

That's what it does to the standard input. It ends the file, so the shell exits and you go back to the login screen or terminate in a case of a Gui terminal, like if you typed logout or exit!




Now Let's get to practice!

Q: Yes please! I've seen too many *.tgz *.tar.gz and *.tar.bz2 extensions.
A: The common types are the ones that are mentioned just above.

The double extention means that the file is compressed in two steps!

First Step: Archiving
The files are just connected to a single one, remembering the relative subdirectory posision. This is like a Tape Archive file creation.
Second step: Compressing
The single file is then compressed with one of the many lossless methods that exist. Usually when finally compressing, the "Tape Archive" is erased, as not being needed anymore, since the compressed file is smaller.

When uncompressing, think absolute opposite. First uncompression, then unarchive.

Q: I don't see any practice yet!
A: Coming now.

.zip .rar .lha etc.
These files are handled the way we know from other systems.
Use zip - unzip lha and rar.
In these cases, archiving and compressing (or vice-versa) are made in a single step.

Simple Tar and gzip - or -  bzip2
1)  Making separate steps

1a)  directory, for example "dir1" in your ~home:


cd         goto our home directory
mkdir dir1 create an empty subdirectory

Compressing:

tar -cf dir1.tar dir1/ && gzip dir1.tar

Notice that gzip replaced dir1.tar with dir1.tar.gz
Both gzip and bzip behave the same way, unless instructed otherwise.

Uncompressing

rm -r dir1
gunzip dir1.tar.gz && tar -xf dir1.tar

The directory will be created with it's contents.
You can "compress" empty directory trees also.


When uncompressing, be sure that the tar file has an initial subdirectory in it, otherwise you might end up with a horde of files placed in your home directory!

1b)  separate files i.e. "1.txt" "2.txt" and "3.txt"

touch 1.txt 2.txt 3.txt create empty files

Compressing:
tar -cf 123.tar [1-3].txt Notice the expression!

The rest is trivial, you can use gzip or bzip for compressing.

Uncompressing:
Assuming that you have the tar file or that you created it again with bunzip or gunzip:
rm [1-3].txt
tar -xf 123.tar

2)  Making a single step.

Tar can handle both steps if the z or j option is added

2a) Directory

Compressing:

rm dir1.tar.gz dir1.tar.bz2 just in case :)
tar czf dir1.tar.gz  dir1 a gzipped file will be created
tar cjf dir1.tar.bz2 dir1 a bzipped file will be created

Careful with the z or j options and the actual compressed file.

Uncompressing:

tar xzf dir1.tar.gz or tar xjf dir1.tar.bz2

2b) Separated Files

Compressing:

tar czf 123.tar.gz  [1-3].txt a gzipped file will be created
tar cjf 123.tar.bz2 [1-3].txt a bzipped file will be created

Uncompressing:

rm [1-3].txt
tar xzf 123.tar.gz or tar xjf 123.tar.bz2

Careful with the z or j options and the actual compressed file.


Advanced Tar and gzip - or -  bzip
Let's now see the Unix way.

These programs interact with the Standard Input-Output and can be combined with pipes and redirections in complicated shell commands.

Advanced commands:

Assuming that 123.tar.bz2 exists and 1.txt, 2.txt, 3.txt don't:

tar -xj < 123.tar.bz2 is like
tar -xjf 123.tar.bz2


Assuming that [1-3].txt files exist and 123.tar.bz2 does not:

tar cjf 123.tar.bz2 [1-3].txt  is like
tar cj [1-3].txt > 123.tar.bz2

Now watch this:
tar -c [1-3].txt | gzip -f > 123.tar.gz
Here, tar compresses the files and passes the output to gzip standard input, which in it's turn (with the -f option) outputs the data and redirects them in a file.

This operation could then pass to an encoding program and mailed to a friend etc.etc.

Archive listing:
tar -tf 123.tar or tar -t < 123.tar lists archived files.
tar -tzf 123.tar.gz lists the compressed file also.

Appending:

With the option -r tar appends files to the end of a tar:

tar -crf 123.tar [4-6].txt && mv 123.tar 123456.tar

in the end we rename the tar file if and only if tar exits succesfully.




Finalizing
tar can work with zip and unzip.

The tar-gzip/bzip combination can store empty directories. So, you can use compression just for storing directory trees.

You should definetly consult the man pages of all programs mentioned in this page.