[<prev] [next>] [day] [month] [year] [list]
Message-ID: <9A043F3CF02CD34C8E74AC1594475C738A3471AA@uxcn10-tdc06.UoA.auckland.ac.nz>
Date: Sun, 6 Apr 2014 08:01:16 +0000
From: Peter Gutmann <pgut001@...auckland.ac.nz>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: RE: [PHC] Mechanical tests (was: POMELO fails the dieharder tests)
"Dennis E. Hamilton" <dennis.hamilton@....org> writes:
>I still think compressibility is a very good idea, although one might want to
>rip out the part of a compressor that determines compressibility and simply
>report compressibility (or efficiency) directly, without the overhead of
>creating the compressed file with its associated tables and codings.
What you're talking about here is an entropy estimator, i.e. something like
DieHard(er). The gzip test is quick and easy, but a pretty bad entropy
estimator, the statistical compressor used is a semi-static Huffman encoder
which does a really bad job at compressing high-entropy data (and other types
of data as well, since it can't deal with fractional bits like an arithmetic
encoder you can create pathological data that a Huffman encoder will always
expand). The LZ77 (strictly speaking LZSS in zLib, but that's just how the
output is encoded) part of the compression, which was never meant to be a
compression algorithm but was created as an entropy estimator, does get
asymptotically close to the source entropy, but only as the source length
approaches infinity.
If you're going to use a data compression algorithm as an entropy estimator
then what you'd really need to use is an adaptive arithmetic encoder, however
as you say you don't need the encoding part at all, only an entropy estimator,
and for that any decent statistical suite will do.
Peter.
Powered by blists - more mailing lists