linux-kernel - Re: [PATCH v5 2/5] lib: Add zstd modules

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170810190055.GA97400@gmail.com>
Date:   Thu, 10 Aug 2017 12:00:55 -0700
From:   Eric Biggers <ebiggers3@...il.com>
To:     Chris Mason <clm@...com>
Cc:     Nick Terrell <terrelln@...com>,
        Herbert Xu <herbert@...dor.apana.org.au>, kernel-team@...com,
        squashfs-devel@...ts.sourceforge.net, linux-btrfs@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org
Subject: Re: [PATCH v5 2/5] lib: Add zstd modules

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> 
> >>The memory reported is the amount of memory the compressor requests.
> >>
> >>| Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
> >>|----------|----------|----------|-------|---------|----------|----------|
> >>| none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
> >>| zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
> >>| zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
> >>| zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
> >>| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
> >>| zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
> >>| zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
> >>| zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
> >>| zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
> >>| zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
> >>| zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
> >>
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.
> 

I am not surprised --- Zstandard is closer to the state of the art, both
format-wise and implementation-wise, than the other choices in BTRFS.  My point
is that benchmarks need to account for how much data is compressed at a time.
This is a common mistake when comparing different compression algorithms; the
algorithm name and compression level do not tell the whole story.  The
dictionary size is extremely significant.  No one is going to compress or
decompress a 200 MB file as a single stream in kernel mode, so it does not make
sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
that it compressed only one page (4 KiB) at a time, but I hope that has been, or
is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
amount of data compressed at a time happens to be different between the
different algorithms, note that BTRFS benchmarks are likely to be measuring that
as much as the algorithms themselves.)

Eric