Paragraphs in this page
are:
Q:
What's the purpose for compressing files?
Now Let's get to practice!
Finalizing
Q: What's the purpose for compressing files?
A: For one reason that's many: Storage simplicity.
Storage simplicity is achieved by default when we compress many files -
or directories containing files - to just a compressed one.
Space is saved by definition: we compress, right?
These reasons generate practical advantages of having data compressed.
Packaging reasons: When we travel a compressed file from or to a
destination, we have to worry only about one file: the compressed one,
and not about a horde of files that we would have to download and place
corectly in the filesystem tree again.
Bandwidth reasons: The compressed (smaller) file takes less to travel
between destinations.
But the packaging reason is so strong that
we may often decide to compress files even if the output file is larger
than the individual ones!
This is common when compressing files that
are individually compressed.
There are two main ways for compressing a file:
With loss and without loss.
Without loss (lossless method): During the compression of
a file, a program that operates certain algorithm methods, parses the
data within one or many files, finds similarities, keeps the similar
data once, and "remembers" the way to invert the procedure.
With Loss: When keeping the file Just like the original is
not the main goal, certain methods take place to dicrease the quality
in a spacified rate, which is very useful when keeping a quality of 70%
saves creates a file 20% the size of the original one. This is just
rate
advantage.
Lossless method
common use:
|
Binary and Text
files.
Therefore:
Documents,Spreadsheets, Databases
Text files
Executables
Libraries
Projects in source code.
|
Loss Method
common use:
|
Media files in
general: pictures sound and
video files.
Broadcast technology, where in the case of a live event, it is obvious
that the "file" is considered not having a beginning or end, and it is
just data being received.
|
Keep in mind that
the "endless" file is not a modern invention.
It's roots fall back in Unix!
Q: I Don't believe you! Give me an example!
Ok. When any shell initiates, it siply reads data from the standard
input which is handled like a file, as everything in *nix.
When you press ctrl d in a void line, this is like the EndOfFile
condition that would terminate any reading in any file operation
program.
That's what it does to the standard input. It ends the file, so the
shell exits and you go back to the login screen or terminate in a case
of a Gui terminal, like if you typed logout or exit!
|
Now Let's get to practice!
Q: Yes please! I've seen too many *.tgz *.tar.gz and *.tar.bz2
extensions.
A: The common types are the ones that are mentioned just above.
The double extention means that the file is compressed in two steps!
First Step:
Archiving
|
The files are just
connected to a single
one, remembering the relative subdirectory posision. This is like a
Tape Archive file creation.
|
Second step:
Compressing
|
The single file is
then compressed with
one of the many lossless methods that exist. Usually when finally
compressing, the "Tape Archive" is erased, as not being needed anymore,
since the
compressed file is smaller.
|
When uncompressing, think absolute opposite. First uncompression, then
unarchive.
Q: I don't see any practice yet!
A: Coming now.
.zip .rar .lha
etc.
|
These files are
handled the way we know
from other systems.
Use zip - unzip lha
and rar.
In these cases, archiving and compressing (or vice-versa) are made in a
single step.
|
Simple Tar and
gzip - or - bzip2
|
1) Making
separate steps
1a) directory, for example "dir1"
in your ~home:
cd goto
our home directory
mkdir dir1 create an empty
subdirectory
Compressing:
tar -cf dir1.tar dir1/ &&
gzip dir1.tar
Notice that gzip replaced
dir1.tar with dir1.tar.gz
Both gzip and bzip behave the same way, unless
instructed otherwise.
Uncompressing
rm -r dir1
gunzip dir1.tar.gz && tar -xf
dir1.tar
The directory will be created with it's
contents.
You can "compress" empty directory trees also.
When uncompressing, be sure that the tar file has an initial
subdirectory in it, otherwise you might end up with a horde of files
placed in your
home directory!
1b) separate files i.e. "1.txt" "2.txt" and "3.txt"
touch 1.txt 2.txt 3.txt create
empty files
Compressing:
tar -cf 123.tar [1-3].txt Notice
the expression!
The rest is trivial, you can use gzip or bzip for compressing.
Uncompressing:
Assuming that you have the tar file or that you created it again with
bunzip or gunzip:
rm [1-3].txt
tar -xf 123.tar
2) Making a single step.
Tar can handle both steps if the z or j option is added
2a) Directory
Compressing:
rm dir1.tar.gz dir1.tar.bz2
just in case :)
tar czf dir1.tar.gz dir1
a gzipped file will be created
tar cjf dir1.tar.bz2 dir1 a
bzipped file will be created
Careful with the z or j options and the actual
compressed file.
Uncompressing:
tar xzf dir1.tar.gz or tar xjf dir1.tar.bz2
2b) Separated Files
Compressing:
tar czf 123.tar.gz [1-3].txt
a gzipped file will be created
tar cjf 123.tar.bz2 [1-3].txt
a bzipped file will be created
Uncompressing:
rm [1-3].txt
tar xzf 123.tar.gz or tar xjf 123.tar.bz2
Careful with the z or j options and the actual
compressed file.
|
Advanced Tar
and gzip - or - bzip
|
Let's now see the
Unix way.
These programs interact with the Standard Input-Output and can be
combined with pipes and redirections in complicated shell commands.
Advanced commands:
Assuming that 123.tar.bz2 exists and 1.txt, 2.txt, 3.txt don't:
tar -xj < 123.tar.bz2 is
like
tar -xjf 123.tar.bz2
Assuming that [1-3].txt files exist and 123.tar.bz2 does not:
tar cjf 123.tar.bz2 [1-3].txt
is like
tar cj [1-3].txt > 123.tar.bz2
Now watch this:
tar -c [1-3].txt | gzip -f >
123.tar.gz
Here, tar compresses the files and passes the output to gzip standard
input, which in it's turn (with the -f option) outputs the data and
redirects them in a file.
This operation could then pass to an encoding program and mailed to a
friend etc.etc.
Archive listing:
tar -tf 123.tar or tar -t < 123.tar lists archived files.
tar -tzf 123.tar.gz lists the
compressed file also.
Appending:
With the option -r tar appends files to the end of a tar:
tar -crf 123.tar [4-6].txt &&
mv 123.tar 123456.tar
in the end we rename the tar file if and only if
tar exits
succesfully.
|
Finalizing
|
tar can work with
zip and unzip.
The tar-gzip/bzip combination can store empty directories. So, you can
use compression just for storing directory trees.
You should definetly consult the man pages of all programs
mentioned in this page.
|
|