[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <019035e5ecae12390048b73c042ec54d@beldev.am>
Date: Thu, 10 Apr 2025 11:02:31 +0400
From: Igor Belousov <igor.b@...dev.am>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Nhat Pham <nphamcs@...il.com>, vitaly.wool@...sulko.se,
linux-mm@...ck.org, akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
Shakeel Butt <shakeel.butt@...ux.dev>, Yosry Ahmed <yosryahmed@...gle.com>
Subject: Re: [PATCH v2] mm: add zblock allocator
> Hi Johannes,
>
>>> Sure. zstd/8 cores/make -j32:
>>>
>>> zsmalloc:
>>> real 7m36.413s
>>> user 38m0.481s
>>> sys 7m19.108s
>>> Zswap: 211028 kB
>>> Zswapped: 925904 kB
>>> zswpin 397851
>>> zswpout 1625707
>>> zswpwb 5126
>>>
>>> zblock:
>>> real 7m55.009s
>>> user 39m23.147s
>>> sys 7m44.004s
>>> Zswap: 253068 kB
>>> Zswapped: 919956 kB
>>> zswpin 456843
>>> zswpout 2058963
>>> zswpwb 3921
>>
>> So zstd results in nearly double the compression ratio, which in turn
>> cuts total execution time *almost in half*.
>>
>> The numbers speak for themselves. Compression efficiency >>> allocator
>> speed, because compression efficiency ultimately drives the continuous
>> *rate* at which allocations need to occur. You're trying to optimize a
>> constant coefficient at the expense of a higher-order one, which is a
>> losing proposition.
>
> Actually there's a slight bug in zblock code for 4K page case which
> caused storage inefficiency for small (== well compressed) memory
> blocks. With that one fixed, the results look a lot brighter for
> zblock:
>
> 1. zblock/zstd/8 cores/make -j32 bzImage
> real 7m28.290s
> user 37m27.055s
> sys 7m18.629s
> Zswap: 221516 kB
> Zswapped: 904104 kB
> zswpin 425424
> zswpout 2011503
> zswpwb 4111
For the sake of completeness I re-ran that test with the bugfix and LZ4
(so, zblock/lz4/8 cores/make -j32 bzImage) and I got:
real 7m44.154s
user 38m26.645s
sys 7m38.302s
zswpin 648108
zswpout 2490449
zswpwb 9499
So there's *no* significant cut with zstd in execution time, even on a
Ryzen 9 and that invalidates your point. Sorry for the past confusion,
it was an honest mistake from our side. If zsmalloc didn't OOM with lz4
we probably would have seen the discrepancy and found the bug earlier.
And on ARM64 and RISC-V targets we have run the tests on, zstd is slower
than lz4.
/Igor
Powered by blists - more mailing lists