lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6m4tEoiUBNBmIjV@gpd3>
Date: Mon, 10 Feb 2025 09:28:36 +0100
From: Andrea Righi <arighi@...dia.com>
To: Yury Norov <yury.norov@...il.com>
Cc: Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
	Changwoo Min <changwoo@...lia.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>, Ian May <ianm@...dia.com>,
	bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/6] mm/numa: Introduce numa_nearest_nodemask()

Hi Yury,

On Sun, Feb 09, 2025 at 12:40:15PM -0500, Yury Norov wrote:
> On Fri, Feb 07, 2025 at 09:40:48PM +0100, Andrea Righi wrote:
> > Introduce the new helper numa_nearest_nodemask() to find the closest
> > node, in a specified nodemask and state, from a given starting node.
> > 
> > Returns MAX_NUMNODES if no node is found.
> > 
> > Cc: Yury Norov <yury.norov@...il.com>
> > Signed-off-by: Andrea Righi <arighi@...dia.com>
> > ---
> >  include/linux/nodemask_types.h |  6 +++++-
> >  include/linux/numa.h           |  8 +++++++
> >  mm/mempolicy.c                 | 38 ++++++++++++++++++++++++++++++++++
> >  3 files changed, 51 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/nodemask_types.h b/include/linux/nodemask_types.h
> > index 6b28d97ea6ed0..8d0b7a66c3a49 100644
> > --- a/include/linux/nodemask_types.h
> > +++ b/include/linux/nodemask_types.h
> > @@ -5,6 +5,10 @@
> >  #include <linux/bitops.h>
> >  #include <linux/numa.h>
> >  
> > -typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
> > +struct nodemask {
> > +	DECLARE_BITMAP(bits, MAX_NUMNODES);
> > +};
> > +
> > +typedef struct nodemask nodemask_t;
> >  
> >  #endif /* __LINUX_NODEMASK_TYPES_H */
> > diff --git a/include/linux/numa.h b/include/linux/numa.h
> > index 3567e40329ebc..a549b87d1fca5 100644
> > --- a/include/linux/numa.h
> > +++ b/include/linux/numa.h
> > @@ -27,6 +27,8 @@ static inline bool numa_valid_node(int nid)
> >  #define __initdata_or_meminfo __initdata
> >  #endif
> >  
> > +struct nodemask;
> 
> Numa should include this via linux/nodemask_types.h, or maybe
> nodemask.h.

Hm... nodemask_types.h needs to include numa.h to resolve MAX_NUMNODES,
Maybe we can move numa_nearest_nodemask() to linux/nodemask.h?

> 
> > +
> >  #ifdef CONFIG_NUMA
> >  #include <asm/sparsemem.h>
> >  
> > @@ -38,6 +40,7 @@ void __init alloc_offline_node_data(int nid);
> >  
> >  /* Generic implementation available */
> >  int numa_nearest_node(int node, unsigned int state);
> > +int numa_nearest_nodemask(int node, unsigned int state, struct nodemask *mask);
> >  
> >  #ifndef memory_add_physaddr_to_nid
> >  int memory_add_physaddr_to_nid(u64 start);
> > @@ -55,6 +58,11 @@ static inline int numa_nearest_node(int node, unsigned int state)
> >  	return NUMA_NO_NODE;
> >  }
> >  
> > +static inline int numa_nearest_nodemask(int node, unsigned int state, struct nodemask *mask)
> > +{
> > +	return NUMA_NO_NODE;
> > +}
> > +
> >  static inline int memory_add_physaddr_to_nid(u64 start)
> >  {
> >  	return 0;
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 162407fbf2bc7..1cfee509c7229 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -196,6 +196,44 @@ int numa_nearest_node(int node, unsigned int state)
> >  }
> >  EXPORT_SYMBOL_GPL(numa_nearest_node);
> >  
> > +/**
> > + * numa_nearest_nodemask - Find the node in @mask at the nearest distance
> > + *			   from @node.
> > + *
> 
> So, I have a feeling about this whole naming scheme. At first, this
> function (and the existing numa_nearest_node()) searches for something,
> but doesn't begin with find_, search_ or similar. Second, the naming
> of existing numa_nearest_node() doesn't reflect that it searches
> against the state. Should we always include some state for search? If
> so, we can skip mentioning the state, otherwise it should be in the
> name, I guess...
> 
> The problem is that I have no idea for better naming, and I have no
> understanding about the future of this functions family. If it's just
> numa_nearest_node() and numa_nearest_nodemask(), I'm OK to go this
> way. If we'll add more flavors similarly to find_bit() family, we
> could probably discuss a naming scheme.
> 
> Also, mm/mempolicy.c is a historical place for them, but maybe we need
> to move it somewhere else?
> 
> Any thoughts appreciated.

Personally I think adding "find_" to the name would be a bit redundant, as
it seems quite obvious that it's finding the nearest node. It sounds a bit
like the get_something() discussion and we can just use something().

About adding "_state" to the name, it'd make sense since we have
for_each_node_state/for_each_node(), but we would need to change
numa_nearest_node() -> numa_nearest_node_state((), that is beyond the scope
of this patch set.

If I had to design this completely from scratch I'd probably propose
something like this:
 - nearest_node_state(node, state)
 - nearest_node(node) -> nearest_node_state(node, N_POSSIBLE)
 - nearest_node_nodemask(node, nodemask) -> here the state can be managed
   with the nodemask (as you suggested below)

But again, this is probably a more generic discussion that can be addressed
in a separate thread.

> 
> > + * @node: the node to start the search from.
> > + * @state: the node state to filter nodes by.
> > + * @mask: a pointer to a nodemask representing the allowed nodes.
> > + *
> > + * This function iterates over all nodes in the given state and calculates
> > + * the distance to the starting node.
> > + *
> > + * Returns the node ID in @mask that is the closest in terms of distance
> > + * from @node, or MAX_NUMNODES if no node is found.
> > + */
> > +int numa_nearest_nodemask(int node, unsigned int state, nodemask_t *mask)
> 
> Your only user calls the function with N_POSSIBLE:
> 
>   s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
>   {
>         nodemask_t unvisited = NODE_MASK_ALL;
> 
>         if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node))
>                return pick_idle_cpu_from_node(cpus_allowed, NUMA_NO_NODE, flags);
> 
> 
>         for_each_numa_node(node, unvisited, N_POSSIBLE)
>                 do_something();
>   }
> 
> Which means you don't need the state at all. Even more, you don't
> need to initialize the unvisited mask before checking the static
> branch:
> 
>   s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node, u64 flags)
>   {
>         nodemask_t unvisited;
> 
>         if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node))
>                return pick_idle_cpu_from_node(cpus_allowed, NUMA_NO_NODE, flags);
> 
>         nodes_clear(unvisited);
> 
>         for_each_numa_node(node, unvisited)
>                 do_something();
>   }
> 
> 
> If you need some state other than N_POSSIBLE, you can do it similarly:
>         
>         nodemask_complement(unvisited, N_CPU);
> 
>         /* Only N_CPU nodes iterated */
>         for_each_numa_node(node, unvisited)
>                 do_something();

Good point. I think we can implicitly assume N_POSSIBLE and if we need to
filter only a certain state in the future, we can enforce that via the
nodemask.

> 
> 
> > +{
> > +	int dist, n, min_dist = INT_MAX, min_node = MAX_NUMNODES;
> > +
> > +	if (node == NUMA_NO_NODE)
> > +		return MAX_NUMNODES;
> > +
> > +	if (node_state(node, state) && node_isset(node, *mask))
> > +		return node;
> 
> This is correct, but why do we need this special case? If distance to
> local node is always 0, and distance to remote node is always greater
> than 0, the normal search will return local node, right? Is that a
> performance trick? If so, can you put a comment please? Otherwise,
> maybe just drop it? 

Yeah we can probably just drop it, I don't it gives much benefit in terms
of performance. And the special optimiation case of searching only in one
node can be managed already by the caller via SCX_PICK_IDLE_IN_NODE.

> 
> > +
> > +	for_each_node_state(n, state) {
> > +		if (!node_isset(n, *mask))
> > +			continue;
> 
> for_each_node_state_and_mask(n, state, mask)
> 
> Or if you take the above consideration, just
>         for_each_node_mask(n, mask)       

Ok, that's much better.

Thanks,
-Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ