lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=OEQKdoWbyAO=LKE--ssLzBH0UVhz3EYaCiodpbMtQvKw@mail.gmail.com>
Date: Mon, 31 Mar 2025 16:03:09 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, hannes@...xchg.org, 
	yosry.ahmed@...ux.dev, chengming.zhou@...ux.dev, sj@...nel.org, 
	kernel-team@...a.com, linux-kernel@...r.kernel.org, gourry@...rry.net, 
	willy@...radead.org, ying.huang@...ux.alibaba.com, 
	jonathan.cameron@...wei.com, linux-cxl@...r.kernel.org, minchan@...nel.org, 
	senozhatsky@...omium.org
Subject: Re: [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store
 the compressed objects

On Mon, Mar 31, 2025 at 3:18 PM Dan Williams <dan.j.williams@...el.com> wrote:
>
> Nhat Pham wrote:
> > Curerntly, zsmalloc does not specify any memory policy when it allocates
> > memory for the compressed objects.
> >
> > Let users select the NUMA node for the memory allocation, through the
> > zpool-based API. Direct callers (i.e zram) should not observe any
> > behavioral change.
> >
> > Signed-off-by: Nhat Pham <nphamcs@...il.com>
> > ---
> >  include/linux/zpool.h |  4 ++--
> >  mm/zpool.c            |  8 +++++---
> >  mm/zsmalloc.c         | 28 +++++++++++++++++++++-------
> >  mm/zswap.c            |  2 +-
> >  4 files changed, 29 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/zpool.h b/include/linux/zpool.h
> > index 52f30e526607..0df8722e13d7 100644
> > --- a/include/linux/zpool.h
> > +++ b/include/linux/zpool.h
> > @@ -22,7 +22,7 @@ const char *zpool_get_type(struct zpool *pool);
> >  void zpool_destroy_pool(struct zpool *pool);
> >
> >  int zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp,
> > -                     unsigned long *handle);
> > +                     unsigned long *handle, int *nid);
>
> I agree with Johannes about the policy knob, so I'll just comment on the
> implementation.
>
> Why not just pass a "const int" for @nid, and use "NUMA_NO_NODE" for the
> "default" case. alloc_pages_node_noprof() is already prepared for a
> NUMA_NO_NODE argument.

Gregory and Johannes gave me that suggestion too! However, my
understanding was that alloc_pages_node(NUMA_NO_NODE, ...) !=
alloc_page(...), and I was trying to preserve the latter since it is
the "original behavior" (primarily for !same_node_mode, but also for
zram, which I tried not to change in this patch).

Please correct me if I'm mistaken, but IIUC:

1. alloc_pages_node(NUMA_NO_NODE, ...) would go to the local/closest node:

/*
 * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
 * prefer the current CPU's closest node. Otherwise node must be valid and
 * online.
 */
static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
   unsigned int order)
{
    if (nid == NUMA_NO_NODE)
        nid = numa_mem_id();


2. whereas, alloc_page(...) (i.e the "original" behavior) would
actually adopt the allocation policy of the task entering reclaim, by
calling:

struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned order)
{
    struct mempolicy *pol = &default_policy;

    /*
    * No reference counting needed for current->mempolicy
    * nor system default_policy
    */
    if (!in_interrupt() && !(gfp & __GFP_THISNODE))
        pol = get_task_policy(current);

Now, I think the "original" behavior is dumb/broken, and should be
changed altogether. We should probably always pass the page's node id.
On the zswap side, in the next version I'll remove same_node_mode
sysfs knob and always pass the page's node id to zsmalloc and the page
allocator. That will clean up the zpool path per your (and Johannes'
and Gregory's) suggestion.

That still leaves zram though. zram is more complicated than zswap -
it has multiple allocation paths, so I don't want to touch it quite
yet (and preferably a zram maintainer/developer should do it). :) Or
if zram maintainers are happy with NUMA_NO_NODE, then we can
completely get rid of the pointer arguments etc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ