linux-kernel - Re: [PATCH 3/6] sched_ext: idle: Introduce the concept of allowed CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z822PGZLYl1Vima4@gpd3>
Date: Sun, 9 Mar 2025 16:39:40 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/6] sched_ext: idle: Introduce the concept of allowed
 CPUs

On Sun, Mar 09, 2025 at 04:56:34AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Sat, Mar 08, 2025 at 07:48:42AM +0100, Andrea Righi wrote:
> > > > With this concept the idle CPU selection policy becomes the following:
> > > >  - always prioritize CPUs from fully idle SMT cores (if SMT is enabled),
> > > >  - select the same CPU if it's idle and in the allowed domain,
> > > >  - select an idle CPU within the same LLC domain, if the LLC domain is a
> > > >    subset of the allowed domain,
> > > 
> > > Why not select from the intersection of the same LLC domain and the cpumask?
> > 
> > We could do that, but to guarantee the intersection we need to introduce
> > other temporary cpumasks (one for the LLC intersection and another for the
> > NUMA), which is not a big problem, but it can introduce overhead. And most
> > of the time the LLC group is either a subset of the allowed CPUs or
> > vice-versa, so in this case the current logic already works.
> > 
> > The extra cpumask work is needed only when the allowed cpumask spans
> > multiple partial LLCs, which should be rare. So maybe in such cases, we
> > could tolerate the additional overhead of updating an additional temporary
> > cpumask to ensure proper hierarchical semantics (maintaining consistency
> > with the topology hierarchy). WDYT?
> 
> Would just using a pre-allocated cpumask to do pre-and on @cpus_allowed
> work? This won't only be used for topology support (e.g. soft partitioning
> in scx_layered and scx_mitosis may want to use multi-topology-unit spanning
> subsets) and I'm not sure assuming and optimizing for that is a good idea
> for generic API.

We can pre-allocate two additional (per-cpu) cpumasks to do:
 - cpumask_and(numa_cpus, numa_span(cpu), cpus_allowed)
 - cpumask_and(llc_cpus, llc_span(cpu), cpus_allowed)

And update/use them only when it's needed. In this way the API would be
generic without making any implicit assumption about @cpus_allowed.

If you don't see any issues, I'll go ahead with this approach.

> 
> We can do something simple now. Note that if we want to optimize it, we can
> introduce cpumask_any_and_and_distribute(). There already is
> cpumask_first_and_and(), so the pattern isn't new and the only extra bitops
> we need to add is find_next_and_and_bit_wrap(). There's already
> find_first_and_and_bit(), so I don't think it will be all that difficult to
> add.

Yes, it'd be really nice to have cpumask_any_and_and_distribute(), but I
agree that we can start simple and provide this as a separate improvement
later on. Looks like a good plan.

Thanks,
-Andrea