linux-kernel - Re: [PATCH v3 3/8] sched/topology: Switch to assigning "sd->shared" from s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a95396aa-33a5-43f5-ace9-d919d1d05b08@intel.com>
Date: Wed, 21 Jan 2026 23:26:28 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: K Prateek Nayak <kprateek.nayak@....com>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
	<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Shrikanth Hegde
	<sshegde@...ux.ibm.com>, "Gautham R. Shenoy" <gautham.shenoy@....com>, "Tim
 Chen" <tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>, Ingo Molnar
	<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
	<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH v3 3/8] sched/topology: Switch to assigning "sd->shared"
 from s_data

On 1/20/2026 7:32 PM, K Prateek Nayak wrote:
> Use the "sched_domain_shared" object allocated in s_data for
> "sd->shared" assignments. Assign "sd->shared" for the topmost
> SD_SHARE_LLC domain before degeneration and rely on the degeneration
> path to correctly pass down the shared object to "sd_llc".
> 
> sd_degenerate_parent() ensures degenerating domains must have the same
> sched_domain_span() which ensures 1:1 passing down of the shared object.
> If the topmost SD_SHARE_LLC domain degenerates, the shared object is
> freed from destroy_sched_domain() when the last reference is dropped.
> 
> build_sched_domains() NULLs out the objects that have been assigned as
> "sd->shared" and the unassigned ones are freed from the __sds_free()
> path.
> 
> Post cpu_attach_domain(), all reclaims of "sd->shared" are handled via
> call_rcu() on the sched_domain object via destroy_sched_domains_rcu().
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
> ---
> Changelog rfc v2..v3:
> 
> o Broke off from a single large patch. Previously
>    https://lore.kernel.org/lkml/20251208092744.32737-3-kprateek.nayak@amd.com/
> ---
>   kernel/sched/topology.c | 34 ++++++++++++++++++++++++----------
>   1 file changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 623e8835d322..0f56462fef6f 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -679,6 +679,9 @@ static void update_top_cache_domain(int cpu)
>   	if (sd) {
>   		id = cpumask_first(sched_domain_span(sd));
>   		size = cpumask_weight(sched_domain_span(sd));
> +
> +		/* If sd_llc exists, sd_llc_shared should exist too. */
> +		WARN_ON_ONCE(!sd->shared);
>   		sds = sd->shared;
>   	}
>   
> @@ -727,6 +730,13 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>   		if (sd_parent_degenerate(tmp, parent)) {
>   			tmp->parent = parent->parent;
>   
> +			/* Pick reference to parent->shared. */
> +			if (parent->shared) {
> +				WARN_ON_ONCE(tmp->shared);
> +				tmp->shared = parent->shared;
> +				parent->shared = NULL;
> +			}
> +
>   			if (parent->parent) {
>   				parent->parent->child = tmp;
>   				parent->parent->groups->flags = tmp->flags;
> @@ -1732,16 +1742,6 @@ sd_init(struct sched_domain_topology_level *tl,
>   		sd->cache_nice_tries = 1;
>   	}
>   
> -	/*
> -	 * For all levels sharing cache; connect a sched_domain_shared
> -	 * instance.
> -	 */
> -	if (sd->flags & SD_SHARE_LLC) {
> -		sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
> -		atomic_inc(&sd->shared->ref);
> -		atomic_set(&sd->shared->nr_busy_cpus, sd_weight);
> -	}
> -
>   	sd->private = sdd;
>   
>   	return sd;
> @@ -2655,8 +2655,19 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>   		unsigned int imb_span = 1;
>   
>   		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
> +			struct sched_domain *parent = sd->parent;
>   			struct sched_domain *child = sd->child;
>   
> +			/* Attach sd->shared to the topmost SD_SHARE_LLC domain. */
> +			if ((sd->flags & SD_SHARE_LLC) &&
> +			    (!parent || !(parent->flags & SD_SHARE_LLC))) {
> +				int llc_id = cpumask_first(sched_domain_span(sd));
> +
> +				sd->shared = *per_cpu_ptr(d.sds, llc_id);

I agree that in the current implementation, we use the llc_id="first CPU" to
index into d.sds, and this value actually represents the LLC ID. In the
cache-aware scheduling, we plan to convert the llc_id to a logical ID that
is no longer tied to the CPU number. Just 2 cents, to avoid confusion, maybe
rename the  aforementioned llc_id to sd_id?

Anyway I will run some tests on the entire patch set and provide feedback
afterward.

thanks,
Chenyu