linux-kernel - Re: [PATCH] mm/zswap: Improve with alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=PKBaqcpFwXpf1-M2nZ+ytFv2oCwxRVCe+CC5++rhgY=A@mail.gmail.com>
Date: Thu, 18 Jan 2024 10:32:51 -0800
From: Nhat Pham <nphamcs@...il.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Ronald Monthero <debug.penguin32@...il.com>, 
	sjenning@...hat.com, ddstreet@...e.org, vitaly.wool@...sulko.com, 
	akpm@...ux-foundation.org, chrisl@...nel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/zswap: Improve with alloc_workqueue() call

On Thu, Jan 18, 2024 at 9:39 AM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Thu, Jan 18, 2024 at 09:06:43AM -0800, Yosry Ahmed wrote:
> > > > > On a different note, I wonder if it would help to perform synchronous
> > > > > reclaim here instead. With our current design, the zswap store failure
> > > > > (due to global limit hit) would leave the incoming page going to swap
> > > > > instead, creating an LRU inversion. Not sure if that's ideal.
> > > >
> > > > The global shrink path keeps reclaiming until zswap can accept again
> > > > (by default, that means reclaiming 10% of the total limit). I think
> > > > this is too expensive to be done synchronously.
> > >
> > > That thresholding code is a bit weird right now.
> > >
> > > It wakes the shrinker and rejects at the same time. We're guaranteed
> > > to see rejections, even if the shrinker has no trouble flushing some
> > > entries a split second later.
> > >
> > > It would make more sense to wake the shrinker at e.g. 95% full and
> > > have it run until 90%.

Yep, we should be reclaiming zswap objects way ahead of the pool
limit. Hence the new shrinker, which is memory pressure-driven (i.e
independent of zswap internal limits), and will typically be triggered
even if the pool is not full. During experiments, I never observe the
pool becoming full, with the default settings. I'd be happy to extend
it (or build in extra shrinking logic) to cover these pool limits too,
if it turns out to be necessary.

> > >
> > > But with that in place we also *should* do synchronous reclaim once we
> > > hit 100%. Just enough to make room for the store. This is important to
> > > catch the case where reclaim rate exceeds swapout rate. Rejecting and
> > > going to swap means the reclaimer will be throttled down to IO rate
> > > anyway, and the app latency isn't any worse. But this way we keep the
> > > pipeline alive, and keep swapping out the oldest zswap entries,
> > > instead of rejecting and swapping what would be the hottest ones.
> >
> > I fully agree with the thresholding code being weird, and with waking
> > up the shrinker before the pool is full. What I don't understand is
> > how we can do synchronous reclaim when we hit 100% and still respect
> > the acceptance threshold :/
> >
> > Are you proposing we change the semantics of the acceptance threshold
> > to begin with?
>
> I kind of am. It's worth looking at the history of this knob.
>
> It was added in 2020 by 45190f01dd402112d3d22c0ddc4152994f9e1e55, and
> from the changelogs and the code in this patch I do not understand how
> this was supposed to work.
>
> It also *didn't* work for very basic real world applications. See
> Domenico's follow-up (e0228d590beb0d0af345c58a282f01afac5c57f3), which
> effectively reverted it to get halfway reasonable behavior.
>
> If there are no good usecases for this knob, then I think it makes
> sense to phase it out again.

Yeah this was my original proposal - remove this knob altogether :)
Based on a cursory read, it just seems like zswap was originally
trying to shrink (synchronously) one "object", then try to check if
the pool size is now under the limit. This is indeed insufficient.
However, I'm not quite convinced by the solution (hysteresis) either.

Maybe we can synchronously shrink a la Domenico, i.e until the pool
can accept new pages, but this time capacity-based (maybe under the
limit + some headroom, 1 page for example)? This is just so that the
immediate incoming zswap store succeeds - we can still have the
shrinker freeing up space later on (or maybe keep an asynchronous
pool-limit based shrinker around).