lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=Nw8PZYKd4TcC2+VW7URzT67aM0wJyYMu5X01ngbFO_Yg@mail.gmail.com>
Date: Mon, 31 Mar 2025 10:32:01 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Yosry Ahmed <yosry.ahmed@...ux.dev>, linux-mm@...ck.org, akpm@...ux-foundation.org, 
	chengming.zhou@...ux.dev, sj@...nel.org, kernel-team@...a.com, 
	linux-kernel@...r.kernel.org, gourry@...rry.net, willy@...radead.org, 
	ying.huang@...ux.alibaba.com, jonathan.cameron@...wei.com, 
	dan.j.williams@...el.com, linux-cxl@...r.kernel.org, minchan@...nel.org, 
	senozhatsky@...omium.org
Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems

On Mon, Mar 31, 2025 at 9:53 AM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Sat, Mar 29, 2025 at 07:53:23PM +0000, Yosry Ahmed wrote:
> > March 29, 2025 at 1:02 PM, "Nhat Pham" <nphamcs@...il.com> wrote:
> >
> > > Currently, systems with CXL-based memory tiering can encounter the
> > > following inversion with zswap: the coldest pages demoted to the CXL
> > > tier can return to the high tier when they are zswapped out,
> > > creating memory pressure on the high tier.
> > > This happens because zsmalloc, zswap's backend memory allocator, does
> > > not enforce any memory policy. If the task reclaiming memory follows
> > > the local-first policy for example, the memory requested for zswap can
> > > be served by the upper tier, leading to the aformentioned inversion.
> > > This RFC fixes this inversion by adding a new memory allocation mode
> > > for zswap (exposed through a zswap sysfs knob), intended for
> > > hosts with CXL, where the memory for the compressed object is requested
> > > preferentially from the same node that the original page resides on.
> >
> > I didn't look too closely, but why not just prefer the same node by
> > default? Why is a knob needed?
>
> +1 It should really be the default.
>
> Even on regular NUMA setups this behavior makes more sense. Consider a
> direct reclaimer scanning nodes in order of allocation preference. If
> it ventures into remote nodes, the memory it compresses there should
> stay there. Trying to shift those contents over to the reclaiming
> thread's preferred node further *increases* its local pressure, and
> provoking more spills. The remote node is also the most likely to
> refault this data again. This is just bad for everybody.

Makes a lot of sense. I'll include this in the v2 of the patch series,
and rephrase this as a generic, NUMA system fix (with CXL as one of
the examples/motivations).

Thanks for the comment, Johannes! I'll remove this knob altogether and
make this the default behavior.

>
> > Or maybe if there's a way to tell the "tier" of the node we can
> > prefer to allocate from the same "tier"?
>
> Presumably, other nodes in the same tier would come first in the
> fallback zonelist of that node, so page_to_nid() should just work.
>
> I wouldn't complicate this until somebody has real systems where it
> does the wrong thing.
>
> My vote is to stick with page_to_nid(), but do it unconditionally.

SGTM.

>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ