MaxPacker version 1.20 final release (100% version for DECAY!)
INTRODUCTION:
Well this is it the final 100% version release.
Only 1 major bugfix! which occured when users tried packing
already compressed files. Now when this happens and as a result
nothing can be made smaller within a 16 kb distance you will see
a message telling you of the problem. This however should rarely
happen since gains can even be made on other types of compression
This Program is written in Assembler for use on the Amiga only!
The current version of MaxPacker is fashioned after that of LHARC,
although it is designed to operate more like TetraPacker.
Testing has revealed that this version Packs 40% faster than Tetra-
Packer and Depacking is also over 200% faster!!!
The average rates are as follows - Pack: 22 kilobytes per minute
DePack: 67 kilobytes per second
Please NOTE: The Following Informations are derived in part from
an early ARJ document. I sought to include the applicable info
here so as to enlighten users a bit more on how file compression
has evolved over the years...
So the following is a brief synopsis on the state of the art in
data compression research. MaxPacker does not use any of the
proprietary algorithms quoted in the text below. Infact the
algorithm is one which is not patented, but like ARJ & LHARC etc.
is based on both Lempel and Huffman encoding to achieve optimum
results.
TERMINOLOGY:
COMPRESSION - The process of encoding redundant information into
data requiring less storage space.
COMPRESSION PERCENTAGE/RATIO - The percentage compression reported
by MaxPacker is a variation of one of the TWO standard methods of
expressing compression ratio in the technical literature. Max uses
the compressed size / original size ratio. The other method is the
inverse ratio. When Max reports 95% as the compression ratio, that
means that the compressed file is 95 percent of the original size
(very little compression). Other archivers use their own methods.
LHARC v1.30 uses an (inverse ratio) the opposite ratio of MaxPacker
RECENT DATA COMPRESSION RESEARCH:
The following information is explained in more detail in the book,
TEXT COMPRESSION by Timothy Bell, John Cleary and Ian Witten
published by Prentice Hall, 1990. ISBN 0-13-911991-4.
Three of the most useful types of data compression involve context
modeling, Lempel-Ziv 1977, and Lempel-Ziv 1978.
Context models use the preceding few characters to PREDICT
(estimate the probability of) the next character. For example,
what is the next character after "tabl"? There is a high
probability that the character is "e". Much of today's research is
focused on these types of data compressors because they provide the
best "TEXT" compression results. The disadvantage of this family
of compressors is the amount of memory required. Three practical
models require from 500,000 to 5,000,000 bytes of memory.
More familiar is the Lempel-Ziv 1978 (LZ78) family of data
compressors. This family as well as Lempel-Ziv 1977 are classified
as adaptive dictionary encoders. In these methods, text strings
are replaced by pointers to previous occurrences of duplicate
strings. For example, the words in this document could be
represented by dictionary page and line numbers. The significant
characteristic of LZ78 algorithms is the parsing of previous text
data into PHRASES that are stored in a dictionary. Pointers are
allowed to reference such phrases; however, pointers are not
allowed to reference substrings of such phrases.
Lempel-Ziv-Welch, 1984, (ARC shrinking, crushing, UNIX COMPRESS) is
the most familiar of this family. Numerous articles have been
published concerning this method. Unknown to many is the fact that
this algorithm has a US patent number owned by UNISYS. These
algorithms are FAST. Their disadvantages include the problem of
handling a FULL dictionary when the input text data no longer
matches the contents of the dictionary and the poor compression of
non-text data.
Lempel-Ziv 1977 (LZ77) algorithms have recently come into popular
practical use (LHARC, PKZIP 1.10, ARJ, PAK). In the original LZ77
scheme, pointers are allowed to reference any phrase in a
fixed-size window that precedes the current phrase. A matched
current phrase is replaced by the pointer to the previously
occurring duplicate phrase. This pointer in LZ77 consists of an
offset into the window and the length of the phrase. The original
implementation was considered impractical as it used a brute force
string searching method to find duplicates. It was quite slow
requiring up to N character comparisons in an N sized window for
each phrase looked for.
However, in 1987, Timothy Bell proposed using a binary tree to
index into the window to speed the lookup of phrases. This
provided an order of magnitude speed increase. In the same year,
Brent published an algorithm that used a secondary compression
method (Huffman encoding, 1952) to further encode the output of a
LZ77 encoder. Huffman encoding is the subsitution of variable
bit-length codes for standard codes (8-bit ASCII) based upon
frequency of occurrence. The most frequent codes have the shortest
bit-lengths.
LHARC is a combination of these two ideas. It uses a binary tree
LZ77 encoder with dynamic Huffman encoding (1978) of the output.
The advantage of using dynamic Huffman encoding is adaptivity to
rapidly changing data. The disadvantage is slow decoding.
PKZIP 1.0, I believe, uses a combination of binary tree encoding
with SHANNON-FANO (1949) static encoding. Static SHANNON-FANO
encoding has the advantage of fast decoding and the disadvantage of
less optimal compression especially on rapidly changing data.
PKZIP will perform better in compression than LHARC on many large
files because PKZIP uses a larger window (8K) than LHARC (4K).
LHARC will perform best on binary data with rapidly changing
characteristics like executables that include a lot of text data.
LH (LHarc 2.0), I believe, uses a digital TRIE LZ77 encoder with a
static Huffman encoder. It uses an 8K window. LH uses a
particularly efficient Huffman encoder which works on small data
sets. Many static encoders suffer from the cost of including the
table of codes with the encoded data. The digital TRIE has the
advantage of always producing the most recent match unlike binary
trees. This recency of match is significant when secondary
compression is used. Much of this work has been derived and
improved upon from literature published in the last two years.
It will be interesting to see if developers can significantly
improve upon LH since it is based upon the most recent published
research in data compression.
One of the most significant new encoders published is that of Fiala
and Greene, 1989. This method uses pointers that point to nodes in
the TREE data structure as opposed to the phrases in the window.
This bypasses the problem of redundancy of phrases in any given
window. Since each node is unique, fewer pointers are needed and
thus are more easily compressed by a secondary compressor. This
technique is currently patent pending.
I hope that this synopsis of compression research has been
enlightening.
End of document
|