linux-kernel - Re: [PATCH] sched_ext: Introduce NUMA awareness to the default idle selection policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Zxs3x5EUMQCQkpJX@gpd3>
Date: Fri, 25 Oct 2024 08:16:39 +0200
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: Introduce NUMA awareness to the default idle
 selection policy

On Thu, Oct 24, 2024 at 09:15:58AM -1000, Tejun Heo wrote:
...
> > @@ -3156,7 +3210,8 @@ static inline const struct cpumask *llc_domain(struct task_struct *p, s32 cpu)
> >  static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
> >                             u64 wake_flags, bool *found)
> >  {
> > -     const struct cpumask *llc_cpus = llc_domain(p, prev_cpu);
> > +     const struct cpumask *llc_cpus = scx_domain(p, prev_cpu, SCX_DOM_LLC);
> > +     const struct cpumask *numa_cpus = scx_domain(p, prev_cpu, SCX_DOM_NUMA);
> 
> This feels like a lot of code which can just be:
> 
>         const struct cpumask *llc_cpus = NULL, *numa_cpus = NULL;
> 
> #ifdef CONFIG_SCHED_MC
>         llc_cpus = rcu_dereference(per_cpu(sd_llc, cpu));
>         numa_cpus = rcu_dereference(per_cpu(sd_numa, cpu));
> #endif
> 

Yeah, I can definitely simplify this part and get rid of some
boilerplate.

> >       s32 cpu;
> >
> >       *found = false;
> > @@ -3226,6 +3281,15 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
> >                               goto cpu_found;
> >               }
> >
> > +             /*
> > +              * Search for any fully idle core in the same NUMA node.
> > +              */
> > +             if (numa_cpus) {
> > +                     cpu = scx_pick_idle_cpu(numa_cpus, SCX_PICK_IDLE_CORE);
> > +                     if (cpu >= 0)
> > +                             goto cpu_found;
> > +             }
> 
> I'm not convinced about the argument that always doing extra pick is
> beneficial. Sure, the overhead is minimal but isn't it also trivial to avoid
> by just testing llc_cpus == numa_cpus (they resolve to the same cpumasks on
> non-NUMA machines, right)? Taking a step further, the topology information
> is really static and can be determined during boot. Wouldn't it make more
> sense to just skip the unnecessary steps depending on topology? I'm not sure
> the difference would be measurable but if you care we can make them
> static_keys too.

Right, on non-NUMA machines llc_cpus and numa_cpus both resolve to the
same CPUs. Also, on systems with a single shared LLC, llc_cpus and
numa_cpus resolve to all CPUs, so in this case we can skip both steps.

Maybe using static_keys is the best, in this way we're sure that we
won't add any overhead in non-NUMA / single-LLC systems compared to the
previous scx_select_cpu_dfl() implementation.

I'll do some tests and send a v2.

Thanks for looking at this!
-Andrea