lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <EED80977-EF3A-49C1-9C11-F19DC819C9CD@fb.com>
Date:   Wed, 11 Oct 2017 02:01:41 +0000
From:   Nick Terrell <terrelln@...com>
To:     Adam Borowski <kilobyte@...band.pl>
CC:     "hpa@...or.com" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "Kernel Team" <Kernel-team@...com>, Chris Mason <clm@...com>,
        Yann Collet <cyan@...com>, Rene Rebe <rene@...ctcode.com>
Subject: Re: [PATCH 0/2] Add support for ZSTD-compressed kernel

On 10/10/17, 5:08 PM, "Adam Borowski" <kilobyte@...band.pl> wrote:
> On Tue, Oct 10, 2017 at 10:40:13PM +0000, Nick Terrell wrote:
> > On 10/10/17, 2:56 PM, "hpa@...or.com" <hpa@...or.com> wrote:
> > >On October 10, 2017 2:22:42 PM PDT, Nick Terrell <terrelln@...com> wrote:
> > >>This patch set adds support for a ZSTD-compressed kernel and ramdisk
> > >>images in the kernel boot process. It only integrates the support with
> > >>x86, though the first patch is generic to all architectures.
> > >>
> > >>Zstandard requires slightly more memory during the kernel decompression
> > >>on x86 (192 KB vs 64 KB), and the memory usage is independent of the
> > >>window size.
> > >>
> > >>Zstandard requires memory proprortional to the window size used during
> > >>compression for decompressing the ramdisk image, since streaming mode
> > >>is
> > >>used. Newer versions of zstd (1.3.2+) list the window size of a file
> > >>with `zstd -lv <file>'. The absolute maximum amount of memory required
> > >>is just over 8 MB.

> > > And, pray tell, what are the actual results?  What is the trade-off of
> > > kernel size versus decompression performance versus the other algorithms
> > > that we already support?  Adding algorithms for their own sake is a bad
> > > thing not a good thing.
> >
> > Sorry I neglected to include the benchmarks I've run so far. I've included
> > them below, and will add them to the next version's cover letter.
> >
> > Comparing the command line tools on a kernel image that is 68970616 B large:
> >
> > | Algorithm | Compression Ratio | Decompression MB/s |
> > |-----------|-------------------|--------------------|
> > | zstd      |              4.42 |              436.5 |
> > | gzip      |              3.72 |              134.1 |
> > | xz        |              4.83 |               53.1 |
> > | lz4       |              3.18 |             1682.2 |
> > | lzo       |              3.36 |              389.6 |
> > | bzip2     |              4.03 |               33.3 |

> Perhaps it'd be a good idea to cull some of bad algorithms?  I don't know
> the memory used by those, but envelope of the table you just shown suggests
> using bzip2 and lzo is pointless.  So is gzip, but it's widespread as the
> default for initramfs producers, thus it'd be unsafe to kill it.

I'm not sure there is a great use case for bzip2. It requires more memory
than xz, compresses worse, and decompresses slower. lzo in the kernel might
decompress a bit faster than zstd (looking back at the BtrFS benchmarks, it
did). More importantly, it uses less memory than zstd. When decompressing
the kernel zstd only needs 192 KB, but for initramfs, it will need more.
Still, unless you really need 5% more compression, lz4 is probably a better
option than lzo for speed.

> > I know that this isn't a real benchmark of the kernel decompression. I
> > still need to figure out how to time the kernel decompression. If you have
> > any suggestions let me know. Otherwise, I'll get back to you when I've
> > figured out how to run the benchmark.

I've found a way to benchmark the kernel decompression time during boot
with QEMU. I add timestamps to every line of the output. I also had to
print 100 lines before the decompression starts to get consistent results.

I've found that zstd is decompressing 2x slower than it should. I narrowed
down the problem to ZSTD_wildcopy() and ZSTD_copy8() in
lib/zstd/zstd_internal.h. ZSTD_wildcopy() calls memcpy(src, dst, 8) in
a loop and doesn't handle the freestanding memcpy() well. Replacing it with
__builtin_mcmpy(src, dst, 8) doubles the speed.

I'm not an expert in freestanding gcc compilation, but I believe it is okay
to call __builtin_memcpy() in freestanding mode, and gcc will either
inline it, or add the right function call. The difference being that gcc
will be able to apply its memcpy() analysis. I also see that
arch/x86/boot/string.h defines memcpy() to __builtin_memcpy. Is it safe to
directly use __builtin_memcpy() in lib/zstd/zstd_internal.h?

If so, I'll submit a separate patch, and make sure to benchmark the
existing use cases (BtrFS and SquashFS).


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ