linux-kernel - Re: [PATCH v2] mm: add zblock allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <24e77aad-08ca-41c4-8e64-301fcc9370b1@konsulko.se>
Date: Tue, 8 Apr 2025 23:38:57 +0200
From: Vitaly Wool <vitaly.wool@...sulko.se>
To: Johannes Weiner <hannes@...xchg.org>, Igor Belousov <igor.b@...dev.am>
Cc: Nhat Pham <nphamcs@...il.com>, linux-mm@...ck.org,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
 Shakeel Butt <shakeel.butt@...ux.dev>, Yosry Ahmed <yosryahmed@...gle.com>
Subject: Re: [PATCH v2] mm: add zblock allocator



On 4/8/25 21:55, Johannes Weiner wrote:
> On Tue, Apr 08, 2025 at 01:20:11PM +0400, Igor Belousov wrote:
>>>>>> Now what's funny is that when I tried to compare how 32 threaded build
>>>>>> would behave on a 8-core VM I couldn't do it because it OOMs with
>>>>>> zsmalloc as zswap backend. With zblock it doesn't, though, and the
>>>>>> results are:
>>>>>> real    12m14.012s
>>>>>> user    39m37.777s
>>>>>> sys     14m6.923s
>>>>>> Zswap:            440148 kB
>>>>>> Zswapped:         924452 kB
>>>>>> zswpin 594812
>>>>>> zswpout 2802454
>>>>>> zswpwb 10878
>>>>
>>>> It's LZ4 for all the test runs.
>>>
>>> Can you try zstd and let me know how it goes :)
>>
>> Sure. zstd/8 cores/make -j32:
>>
>> zsmalloc:
>> real	7m36.413s
>> user	38m0.481s
>> sys	7m19.108s
>> Zswap:            211028 kB
>> Zswapped:         925904 kB
>> zswpin 397851
>> zswpout 1625707
>> zswpwb 5126
>>
>> zblock:
>> real	7m55.009s
>> user	39m23.147s
>> sys	7m44.004s
>> Zswap:            253068 kB
>> Zswapped:         919956 kB
>> zswpin 456843
>> zswpout 2058963
>> zswpwb 3921
> 
> So zstd results in nearly double the compression ratio, which in turn
> cuts total execution time *almost in half*.
> 
> The numbers speak for themselves. Compression efficiency >>> allocator
> speed, because compression efficiency ultimately drives the continuous
> *rate* at which allocations need to occur. You're trying to optimize a
> constant coefficient at the expense of a higher-order one, which is a
> losing proposition.

Well, not really. This is an isolated use case with
a. significant computing power under the hood
b. relatively few cores
c. relatively short test
d. 4K pages

If any of these isn't true, zblock dominates.
!a => zstd is too slow
!b => parallelization gives more effect
!c => zsmalloc starts losing due to having to deal with internal 
fragmentation
!d => compression efficiency of zblock is better.

Even !d alone makes zblock a better choice for ARM64 based servers.

~Vitaly