[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=OEQKdoWbyAO=LKE--ssLzBH0UVhz3EYaCiodpbMtQvKw@mail.gmail.com>
Date: Mon, 31 Mar 2025 16:03:09 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, hannes@...xchg.org,
yosry.ahmed@...ux.dev, chengming.zhou@...ux.dev, sj@...nel.org,
kernel-team@...a.com, linux-kernel@...r.kernel.org, gourry@...rry.net,
willy@...radead.org, ying.huang@...ux.alibaba.com,
jonathan.cameron@...wei.com, linux-cxl@...r.kernel.org, minchan@...nel.org,
senozhatsky@...omium.org
Subject: Re: [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store
the compressed objects
On Mon, Mar 31, 2025 at 3:18 PM Dan Williams <dan.j.williams@...el.com> wrote:
>
> Nhat Pham wrote:
> > Curerntly, zsmalloc does not specify any memory policy when it allocates
> > memory for the compressed objects.
> >
> > Let users select the NUMA node for the memory allocation, through the
> > zpool-based API. Direct callers (i.e zram) should not observe any
> > behavioral change.
> >
> > Signed-off-by: Nhat Pham <nphamcs@...il.com>
> > ---
> > include/linux/zpool.h | 4 ++--
> > mm/zpool.c | 8 +++++---
> > mm/zsmalloc.c | 28 +++++++++++++++++++++-------
> > mm/zswap.c | 2 +-
> > 4 files changed, 29 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/zpool.h b/include/linux/zpool.h
> > index 52f30e526607..0df8722e13d7 100644
> > --- a/include/linux/zpool.h
> > +++ b/include/linux/zpool.h
> > @@ -22,7 +22,7 @@ const char *zpool_get_type(struct zpool *pool);
> > void zpool_destroy_pool(struct zpool *pool);
> >
> > int zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp,
> > - unsigned long *handle);
> > + unsigned long *handle, int *nid);
>
> I agree with Johannes about the policy knob, so I'll just comment on the
> implementation.
>
> Why not just pass a "const int" for @nid, and use "NUMA_NO_NODE" for the
> "default" case. alloc_pages_node_noprof() is already prepared for a
> NUMA_NO_NODE argument.
Gregory and Johannes gave me that suggestion too! However, my
understanding was that alloc_pages_node(NUMA_NO_NODE, ...) !=
alloc_page(...), and I was trying to preserve the latter since it is
the "original behavior" (primarily for !same_node_mode, but also for
zram, which I tried not to change in this patch).
Please correct me if I'm mistaken, but IIUC:
1. alloc_pages_node(NUMA_NO_NODE, ...) would go to the local/closest node:
/*
* Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
* prefer the current CPU's closest node. Otherwise node must be valid and
* online.
*/
static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
unsigned int order)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
2. whereas, alloc_page(...) (i.e the "original" behavior) would
actually adopt the allocation policy of the task entering reclaim, by
calling:
struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned order)
{
struct mempolicy *pol = &default_policy;
/*
* No reference counting needed for current->mempolicy
* nor system default_policy
*/
if (!in_interrupt() && !(gfp & __GFP_THISNODE))
pol = get_task_policy(current);
Now, I think the "original" behavior is dumb/broken, and should be
changed altogether. We should probably always pass the page's node id.
On the zswap side, in the next version I'll remove same_node_mode
sysfs knob and always pass the page's node id to zsmalloc and the page
allocator. That will clean up the zpool path per your (and Johannes'
and Gregory's) suggestion.
That still leaves zram though. zram is more complicated than zswap -
it has multiple allocation paths, so I don't want to touch it quite
yet (and preferably a zram maintainer/developer should do it). :) Or
if zram maintainers are happy with NUMA_NO_NODE, then we can
completely get rid of the pointer arguments etc.
Powered by blists - more mailing lists