linux-kernel - Re: [PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250814170426.71159-1-sj@kernel.org>
Date: Thu, 14 Aug 2025 10:04:26 -0700
From: SeongJae Park <sj@...nel.org>
To: Johannes Weiner <hannes@...xchg.org>
Cc: SeongJae Park <sj@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	David Hildenbrand <david@...hat.com>,
	Nhat Pham <nphamcs@...il.com>,
	Yosry Ahmed <yosry.ahmed@...ux.dev>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	Takero Funaki <flintglass@...il.com>
Subject: Re: [PATCH v2] mm/zswap: store <PAGE_SIZE compression failed page as-is

On Thu, 14 Aug 2025 12:23:50 -0400 Johannes Weiner <hannes@...xchg.org> wrote:

> On Tue, Aug 12, 2025 at 10:00:46AM -0700, SeongJae Park wrote:
> > When zswap writeback is enabled and it fails compressing a given page,
> > the page is swapped out to the backing swap device.  This behavior
> > breaks the zswap's writeback LRU order, and hence users can experience
> > unexpected latency spikes.  If the page is compressed without failure,
> > but results in a size of PAGE_SIZE, the LRU order is kept, but the
> > decompression overhead for loading the page back on the later access is
> > unnecessary.
> > 
> > Keep the LRU order and optimize unnecessary decompression overheads in
> > those cases, by storing the original content as-is in zpool.  The length
> > field of zswap_entry will be set appropriately, as PAGE_SIZE,  Hence
> > whether it is saved as-is or not (whether decompression is unnecessary)
> > is identified by 'zswap_entry->length == PAGE_SIZE'.
[...]
> > Tests
> > -----
> > 
> > I tested this patch using a simple self-written microbenchmark that is
> > available at GitHub[1].  You can reproduce the test I did by executing
> > run_tests.sh of the repo on your system.  Note that the repo's
> > documentation is not good as of this writing, so you may need to read
> > and use the code.
> > 
> > The basic test scenario is simple.  Run a test program making artificial
> > accesses to memory having artificial content under memory.high-set
> > memory limit and measure how many accesses were made in given time.
> > 
> > The test program repeatedly and randomly access three anonymous memory
> > regions.  The regions are all 500 MiB size, and accessed in the same
> > probability.  Two of those are filled up with a simple content that can
> > easily be compressed, while the remaining one is filled up with a
> > content that read from /dev/urandom, which is easy to fail at
> > compressing to <PAGE_SIZE size.  The program runs for two minutes and
> > prints out the number of accesses made every five seconds.
> > 
> > The test script runs the program under below seven configurations.
> 
>                                                four?

Nice catch, thank you!  I will fix this in the next version.

> 
> > - 0: memory.high is set to 2 GiB, zswap is disabled.
> > - 1-1: memory.high is set to 1350 MiB, zswap is disabled.
> > - 1-2: On 1-1, zswap is enabled without this patch.
> > - 1-3: On 1-2, this patch is applied.
> 
> I'm not sure 0 and 1-1 are super interesting, since we care about
> zswap performance under pressure before and after the patch.

You're right, the main focus should be 1-2 vs 1-3.

I added 0 and 1-1, too, though, just for showing how artificial and micro the
given test setup is.

That is, as I also mentioned on the original mail, this test setup is for
pretty extreme pressure case.  To quote,

        config            0       1-1     1-2      1-3
        perf_normalized   1.0000  0.0057  0.0235   0.0367
    [...]
    Note that ~10% working set pressure is already extreme, at least
    on this test setup.  No one would desire system setups that can degrade
    performance to 0.57% of the best case.

Enabling zswap improves the performance to 2.35% of the ideal setup, and this
patch further improves it to 3.67%.  But still no one would want a setup that
achieves only 3.67% of the ideal performance.  The performance itself is also
a pure memory random access performance, which is again far from those for real
workloads, though.

So, this is a microbenchmarking test that pretty far from the real world.

I will add more description about the intention of the configs in the next
version.  Let me know if you think it is only disturbing reading of the patch
and better to be just dropped.
 
> 
> > For all zswap enabled cases, zswap shrinker is enabled.
> 
> It would, however, be good to test without the shrinker as well, since
> it's currently optional and not enabled by default.

Agreed.  Actually we ran without shrinker for RFC v1[1] of this patch.  The
version also showed performance improvement in the configuration.  Since we
made no big change after the version, I expect no big difference.

Nonetheless, I will run the test with zswap shrinker disabled setup again with
this version, and share the results.  Unless the results is far from my
expectation, I will directly post the next version of this patch with abovely
promised changelog updates.

> 
> The performance with the shrinker looks great, but if it regresses for
> setups without the shrinker we need additional gating.

Agreed.  If I find a regression from the abovely promised rerun of the test, I
will share that as a reply to this thread, and start a discussion for the
possible additional gating.

[1] https://lore.kernel.org/20250730234059.4603-1-sj@kernel.org


Thanks,
SJ