linux-kernel - Re: [PATCH v2 5/6] mm: zswap: store incompressible page as-is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=O+7=rTjaUEQ+3v=f9UHF_GOXWQO9HVZtLZOPCTRh6sVw@mail.gmail.com>
Date: Fri, 12 Jul 2024 15:36:29 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Takero Funaki <flintglass@...il.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, Jonathan Corbet <corbet@....net>, 
	Andrew Morton <akpm@...ux-foundation.org>, 
	Domenico Cerasuolo <cerasuolodomenico@...il.com>, linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 5/6] mm: zswap: store incompressible page as-is

On Sun, Jul 7, 2024 at 2:38 AM Takero Funaki <flintglass@...il.com> wrote:
>
> 2024年7月7日(日) 8:53 Nhat Pham <nphamcs@...il.com>:
> >
> > I tried to propose something similar in the past. Please read the
> > following discussion:
> >
> > https://lore.kernel.org/all/CAJD7tka6XRyzYndRNEFZmi0Zj4DD2KnVzt=vMGhfF4iN2B4VKw@mail.gmail.com/
> >
> > But, the TLDR is Yosry was (rightly) concerned that with this
> > approach, memory reclaiming could end up increasing memory usage
> > rather than reducing (since we do not free up the page that fail to
> > zswap-out, and we need extra memory for the zswap metadata of that
> > page).
> >
> > So my vote on this patch would be NACK, until we get around this issue
> > somehow :)
>
> It seems the discussion on the thread mixed up memory allocation
> failure (system runs out of memory reserve) and incompressible pages
> (compression algorithm successfully compressed but the result is equal
> to or larger than PAGE_SIZE).
>
> zswap has been storing pages into dedicated pages 1:1 when compressed
> to near PAGE_SIZE. Using zsmalloc, current zswap stores pages
> compressed to between 3633 bytes (=hugeclass+1) to 4095 bytes
> (=PAGE_SIZE-1) into 1 page. This patch changes the range to 3633 to
> 4096 by treating PAGE_SIZE as a special case. I could not find a
> reason to reject only PAGE_SIZE while accepting PAGE_SIZE-1.
>

I'm not actually sure if this is true in practice. While yes, zsmalloc
has the capability to store near-PAGE_SIZE objects, this also depends
on the compression algorithm.

At Meta, we use zstd. What I have found is that a lot of the time, it
just flat out rejects the page if it's too poorly compressed. Without
this change, we will not have to suffer the memory overhead of the
zswap_entry structures for these rejected pages, whereas we will with
this change.

We might need to run some tracing to get a histogram of the
distribution of post-compression sizes.

> zswap wastes memory for metadata for all accepted pages but reduces IO

Key word: accepted. The compression algorithm might already have some
built in logic to reject poorly compressed pages, preventing the cases
where the overhead might be too high for the saving.

> amount and latency by compressed buffer memory. For pages between 3633
> to 4096 bytes, zswap reduces the latency only. This is still
> beneficial because the rare incompressible pages trigger urgent
> pageout IO and incur a head-of-line blocking on the subsequent pages.
> It also keeps LRU priority for pagein latency.
>
> In the worst case or with a malicious dataset, zswap will waste a
> significant amount of memory, but this patch does not affect nor
> resolve the scenario. For example, if a user allocates pages
> compressed to 3633 bytes, current zswap using zsmalloc cannot gain
> memory as the compression ratio, including zsmalloc overhead, becomes
> 1:1. This also applies to zbud. The compression ratio will be 1:1 as
> zbud cannot find buddies smaller than 463 bytes. zswap will be less
> efficient but still work in this situation since the max pool percent
> and background writeback ensure the pool size does not overwhelm
> usable memory.
>
> I suppose the current zswap has accepted the possible waste of memory,
> at least since the current zswap_compress() logic was implemented. If
> zswap had to ensure the compression ratio is better than 1:1, and only
> prefers reducing IO amount (not latency), there would have been a
> compression ratio threshold to reject pages not compressible to under
> 2048 bytes. I think accepting nearly incompressible pages is
> beneficial and changing the range to 4096 does not negatively affect
> the current behavior.

FWIW, I do agree with your approach (storing incompressible pages in
the zswap pool to maintain LRU ordering) - this is *essentially* what
I was trying to do too with the attempt I mentioned above.

I'll let Johannes and Yosry chime in as well, since they were the
original folks who raised these concerns :) If they're happy then I'll
revoke my NACK.