linux-kernel - Re: [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkYD+y54-KYEotWspRdNL_AC0SxE147tR+dSLvY-=9jJyg@mail.gmail.com>
Date: Fri, 14 Jun 2024 17:19:24 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Takero Funaki <flintglass@...il.com>
Cc: Nhat Pham <nphamcs@...il.com>, Johannes Weiner <hannes@...xchg.org>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, Jonathan Corbet <corbet@....net>, 
	Andrew Morton <akpm@...ux-foundation.org>, 
	Domenico Cerasuolo <cerasuolodomenico@...il.com>, linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 0/3] mm: zswap: global shrinker fix and proactive shrink

On Thu, Jun 13, 2024 at 9:09 PM Takero Funaki <flintglass@...il.com> wrote:
>
> 2024年6月14日(金) 0:22 Nhat Pham <nphamcs@...il.com>:
> >
> > Taking a step back from the correctness conversation, could you
> > include in the changelog of the patches and cover letter a realistic
> > scenario, along with user space-visible metrics that show (ideally all
> > 4, but at least some of the following):
> >
> > 1. A user problem (that affects performance, or usability, etc.) is happening.
> >
> > 2. The root cause is what we are trying to fix (for e.g in patch 1, we
> > are skipping over memcgs unnecessarily in the global shrinker loop).
> >
> > 3. The fix alleviates the root cause in b)
> >
> > 4. The userspace-visible problem goes away or is less serious.
> >
>
> Thank you for your suggestions.
> For quick response before submitting v2,

Thanks for all the info, this should be in the cover letter or commit
messages in some shape or form.

>
> 1.
> The visible issue is that pageout/in operations from active processes
> are slow when zswap is near its max pool size. This is particularly
> significant on small memory systems, where total swap usage exceeds
> what zswap can store. This means that old pages occupy most of the
> zswap pool space, and recent pages use swap disk directly.

This should be a transient state though, right? Once the shrinker
kicks in it should writeback the old pages and make space for the hot
ones. Which takes us to our next point.

>
> 2.
> This issue is caused by zswap keeping the pool size near 100%. Since
> the shrinker fails to shrink the pool to accept_thr_percent and zswap
> rejects incoming pages, rejection occurs more frequently than it
> should. The rejected pages are directly written to disk while zswap
> protects old pages from eviction, leading to slow pageout/in
> performance for recent pages on the swap disk.

Why is the shrinker failing? IIUC the first two patches fixes two
cases where the shrinker stumbles upon offline memcgs, or memcgs with
no zswapped pages. Are these cases common enough in your use case that
every single time the shrinker runs it hits MAX_RECLAIM_RETRIES before
putting the zswap usage below accept_thr_percent?

This would be surprising given that we should be restarting the
shrinker with every swapout attempt until we can accept pages again.

I guess one could construct a malicious case where there are some
sticky offline memcgs, and all the memcgs that actually have zswap
pages come after it in the iteration order.

Could you shed more light about this? What does the setup look like?
How many memcgs there are, how many of them use zswap, and how many
offline memcgs are you observing?

I am not saying we shouldn't fix these problems anyway, I am just
trying to understand how we got into this situation to begin with.

>
> 3.
> If the pool size were shrunk proactively, rejection by pool limit hits
> would be less likely. New incoming pages could be accepted as the pool
> gains some space in advance, while older pages are written back in the
> background. zswap would then be filled with recent pages, as expected
> in the LRU logic.

I suspect if patches 1 and 2 fix your problem, the shrinker invoked
from reclaim should be doing this sort of "proactive shrinking".

I agree that the current hysteresis around accept_thr_percent is not
good enough, but I am surprised you are hitting the pool limit if the
shrinker is being run during reclaim.

>
> Patch 1 and 2 make the shrinker reduce the pool to accept_thr_percent.
> Patch 3 makes zswap_store trigger the shrinker before reaching the max
> pool size. With this series, zswap will prepare some space to reduce
> the probability of problematic pool_limit_hit situation, thus reducing
> slow reclaim and the page priority inversion against LRU.
>
> 4.
> Once proactive shrinking reduces the pool size, pageouts complete
> instantly as long as the space prepared by shrinking can store the
> direct reclaim. If an admin sees a large pool_limit_hit, lowering
> accept_threshold_percent will improve active process performance.

I agree that proactive shrinking is preferable to waiting until we hit
pool limit, then stop taking in pages until the acceptance threshold.
I am just trying to understand whether such a proactive shrinking
mechanism will be needed if the reclaim shrinker for zswap is being
used, how the two would work together.