linux-kernel - Re: [PATCH v2] mm: add zblock allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <019035e5ecae12390048b73c042ec54d@beldev.am>
Date: Thu, 10 Apr 2025 11:02:31 +0400
From: Igor Belousov <igor.b@...dev.am>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Nhat Pham <nphamcs@...il.com>, vitaly.wool@...sulko.se,
 linux-mm@...ck.org, akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
 Shakeel Butt <shakeel.butt@...ux.dev>, Yosry Ahmed <yosryahmed@...gle.com>
Subject: Re: [PATCH v2] mm: add zblock allocator

> Hi Johannes,
> 
>>> Sure. zstd/8 cores/make -j32:
>>> 
>>> zsmalloc:
>>> real	7m36.413s
>>> user	38m0.481s
>>> sys	7m19.108s
>>> Zswap:            211028 kB
>>> Zswapped:         925904 kB
>>> zswpin 397851
>>> zswpout 1625707
>>> zswpwb 5126
>>> 
>>> zblock:
>>> real	7m55.009s
>>> user	39m23.147s
>>> sys	7m44.004s
>>> Zswap:            253068 kB
>>> Zswapped:         919956 kB
>>> zswpin 456843
>>> zswpout 2058963
>>> zswpwb 3921
>> 
>> So zstd results in nearly double the compression ratio, which in turn
>> cuts total execution time *almost in half*.
>> 
>> The numbers speak for themselves. Compression efficiency >>> allocator
>> speed, because compression efficiency ultimately drives the continuous
>> *rate* at which allocations need to occur. You're trying to optimize a
>> constant coefficient at the expense of a higher-order one, which is a
>> losing proposition.
> 
> Actually there's a slight bug in zblock code for 4K page case which 
> caused storage inefficiency for small (== well compressed) memory 
> blocks. With that one fixed, the results look a lot brighter for 
> zblock:
> 
> 1. zblock/zstd/8 cores/make -j32 bzImage
> real	7m28.290s
> user	37m27.055s
> sys	7m18.629s
> Zswap:            221516 kB
> Zswapped:         904104 kB
> zswpin 425424
> zswpout 2011503
> zswpwb 4111

For the sake of completeness I re-ran that test with the bugfix and LZ4 
(so, zblock/lz4/8 cores/make -j32 bzImage) and I got:
real	7m44.154s
user	38m26.645s
sys	7m38.302s
zswpin 648108
zswpout 2490449
zswpwb 9499

So there's *no* significant cut with zstd in execution time, even on a 
Ryzen 9 and that invalidates your point. Sorry for the past confusion, 
it was an honest mistake from our side. If zsmalloc didn't OOM with lz4 
we probably would have seen the discrepancy and found the bug earlier.

And on ARM64 and RISC-V targets we have run the tests on, zstd is slower 
than lz4.

/Igor