[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aEBhS-WDH_kaXmVd@gpd4>
Date: Wed, 4 Jun 2025 17:07:55 +0200
From: Andrea Righi <arighi@...dia.com>
To: Yury Norov <yury.norov@...il.com>
Cc: Tejun Heo <tj@...nel.org>, David Vernet <void@...ifault.com>,
Changwoo Min <changwoo@...lia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
Hi Yuri,
On Wed, Jun 04, 2025 at 10:05:15AM -0400, Yury Norov wrote:
> Hi Andrea!
>
> On Tue, Jun 03, 2025 at 10:22:01AM +0200, Andrea Righi wrote:
> > In the idle CPU selection logic, attempting cross-node searches adds
> > unnecessary complexity when CONFIG_NUMA is disabled.
> >
> > Since there's no meaningful concept of nodes in this case, simplify the
> > logic by restricting the idle CPU search to the current node only.
> >
> > Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
> > Signed-off-by: Andrea Righi <arighi@...dia.com>
> > ---
> > kernel/sched/ext_idle.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index 66da03cc0b338..8660d9ae40169 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -138,6 +138,7 @@ static s32 pick_idle_cpu_in_node(const struct cpumask *cpus_allowed, int node, u
> > goto retry;
> > }
> >
> > +#ifdef CONFIG_NUMA
>
> It would be more natural if you move this inside the function body,
> and not duplicate the function declaration.
I was trying to catch both the function and the per_cpu_unvisited with a
single #ifdef, but I can definitely split that and add another #ifdef
inside the function body.
>
> > /*
> > * Tracks nodes that have not yet been visited when searching for an idle
> > * CPU across all available nodes.
> > @@ -186,6 +187,13 @@ static s32 pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, i
> >
> > return cpu;
> > }
> > +#else
> > +static inline s32
> > +pick_idle_cpu_from_online_nodes(const struct cpumask *cpus_allowed, int node, u64 flags)
> > +{
> > + return -EBUSY;
> > +}
>
> This is misleading errno. The system is nut busy, it is disabled. If
> it was a syscall, I would say you should return ENOSYS. ENODATA is
> another candidate. Or you have a special policy for the subsystem/
So, this function is called only from scx_pick_idle_cpu(), that can still
call pick_idle_cpu_from_online_nodes() even on kernels with !CONFIG_NUMA,
if the BPF scheduler enables the per-node idle cpumask (setting the flag
SCX_OPS_BUILTIN_IDLE_PER_NODE).
We can return -ENOSYS, but then we still need to return -EBUSY from
scx_pick_idle_cpu(), since its logic is host-wide, so the choice of -EBUSY
was to be consistent with that.
However, I don't have a strong opinion, if you think it's clearer to return
-ENOSYS/ENODATA from pick_idle_cpu_from_online_nodes() I can change that,
but I'd still return -EBUSY from scx_pick_idle_cpu().
>
> The above pick_idle_cpu_in_node() doesn't have CONFIG_NUMA protection
> as well. Is it safe against CONFIG_NUMA?
pick_idle_cpu_in_node() is always called with a validated node (when passed
from BPF) or a node from the kernel and idle_cpumask() is handling the
NUMA_NO_NODE case, so that should be fine in theory.
Thanks,
-Andrea
PS Tejun already applied this patch to his tree, so I'll send all the
changes as a followup patch, at least the original bug is fixed. :)
Powered by blists - more mailing lists