[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=OVRaUcD8A4HkCZWisNPH+Q9VzOGMJeHnOi40AnHsjjjw@mail.gmail.com>
Date: Sat, 29 Mar 2025 15:13:20 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, hannes@...xchg.org,
chengming.zhou@...ux.dev, sj@...nel.org, kernel-team@...a.com,
linux-kernel@...r.kernel.org, gourry@...rry.net, willy@...radead.org,
ying.huang@...ux.alibaba.com, jonathan.cameron@...wei.com,
dan.j.williams@...el.com, linux-cxl@...r.kernel.org, minchan@...nel.org,
senozhatsky@...omium.org
Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems
On Sat, Mar 29, 2025 at 12:53 PM Yosry Ahmed <yosry.ahmed@...ux.dev> wrote:
>
> March 29, 2025 at 1:02 PM, "Nhat Pham" <nphamcs@...il.com> wrote:
>
> > Currently, systems with CXL-based memory tiering can encounter the
> > following inversion with zswap: the coldest pages demoted to the CXL
> > tier can return to the high tier when they are zswapped out,
> > creating memory pressure on the high tier.
> > This happens because zsmalloc, zswap's backend memory allocator, does
> > not enforce any memory policy. If the task reclaiming memory follows
> > the local-first policy for example, the memory requested for zswap can
> > be served by the upper tier, leading to the aformentioned inversion.
> > This RFC fixes this inversion by adding a new memory allocation mode
> > for zswap (exposed through a zswap sysfs knob), intended for
> > hosts with CXL, where the memory for the compressed object is requested
> > preferentially from the same node that the original page resides on.
>
> I didn't look too closely, but why not just prefer the same node by default? Why is a knob needed?
Good question, yeah the knob is to maintain the old behavior :) It
might not be optimal, or even advisable, for all set up.
For hosts with node-based memory tiering, then yeah it's a good idea
in general, but I don't quite know how to have information about that
from the kernel's perspective.
>
> Or maybe if there's a way to tell the "tier" of the node we can prefer to allocate from the same "tier"?
Is there an abstraction of the "tier" that we can use here?
Powered by blists - more mailing lists