lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Oct 2023 22:33:23 +0000
From:   Nick Terrell <terrelln@...a.com>
To:     Jonathan Neuschäfer <j.neuschaefer@....net>
CC:     Nick Terrell <terrelln@...a.com>, Arnd Bergmann <arnd@...db.de>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Russell King <linux@...linux.org.uk>,
        Nick Terrell <terrelln@...a.com>,
        Tony Lindgren <tony@...mide.com>,
        Geert Uytterhoeven <geert+renesas@...der.be>,
        Linus Walleij <linus.walleij@...aro.org>,
        Sebastian Reichel <sebastian.reichel@...labora.com>,
        "Hawkins, Nick" <nick.hawkins@....com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Florian Fainelli <f.fainelli@...il.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Xin Li <xin3.li@...el.com>,
        Seung-Woo Kim <sw0312.kim@...sung.com>,
        Paul Bolle <pebolle@...cali.nl>,
        Bart Van Assche <bvanassche@....org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/3] ARM ZSTD boot compression



> On Apr 14, 2023, at 10:00 PM, Jonathan Neuschäfer <j.neuschaefer@....net> wrote:
> 
> On Thu, Apr 13, 2023 at 01:13:21PM +0200, Arnd Bergmann wrote:
>> On Wed, Apr 12, 2023, at 23:33, Arnd Bergmann wrote:
>>> On Wed, Apr 12, 2023, at 23:21, Jonathan Neuschäfer wrote:
>>>> This patchset enables ZSTD kernel (de)compression on 32-bit ARM.
>>>> Unfortunately, it is much slower than I hoped (tested on ARM926EJ-S):
>>>> 
>>>> - LZO:  7.2 MiB,  6 seconds
>>>> - ZSTD: 5.6 MiB, 60 seconds
>>> 
>>> That seems unexpected, as the usual numbers say it's about 25%
>>> slower than LZO. Do  you have an idea why it is so much slower
>>> here? How long does it take to decompress the
>>> generated arch/arm/boot/Image file in user space on the same
>>> hardware using lzop and zstd?
>> 
>> I looked through this a bit more and found two interesting points:
>> 
>> - zstd uses a lot more unaligned loads and stores while
>>  decompressing. On armv5 those turn into individual byte
>>  accesses, while the others can likely use word-aligned
>>  accesses. This could make a huge difference if caches are
>>  disabled during the decompression.
>> 
>> - The sliding window on zstd is much larger, with the kernel
>>  using an 8MB window (zstd=23), compared to the normal 32kb
>>  for deflate (couldn't find the default for lzo), so on
>>  machines with no L2 cache, it is much likely to thrash a
>>  small L1 dcache that are used on most arm9.
>> 
>>      Arnd
> 
> Make sense.
> 
> For ZSTD as used in kernel decompression (the zstd22 configuration), the
> window is even bigger, 128 MiB. (AFAIU)

Sorry, I’m a bit late to the party, I wasn’t getting LKML email for some time...

But this is totally configurable. You can switch compression configurations
at any time. If you believe that the window size is the issue causing speed
regressions, you could use a zstd compression to use a e.g. 256KB window
size like this:

  zstd -19 --zstd=wlog=18

This will keep the same algorithm search strength, but limit the decoder memory
usage.

I will also try to get this patchset working on my machine, and try to debug.
The 10x slower speed difference is not expected, and we see much better speed
in userspace ARM. I suspect it has something to do with the preboot environment.
E.g. when implementing x86-64 zstd kernel decompression, I noticed that
memcpy(dst, src, 16) wasn’t getting inlined properly, causing a massive performance
penalty.

Best,
Nick Terrell

> Thanks
> 
> Jonathan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ