[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <857e86a9-9007-4942-b005-1574c919ad6b@intel.com>
Date: Fri, 12 Sep 2025 13:24:21 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>
CC: Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann
<dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Tim Chen
<tim.c.chen@...el.com>, Vincent Guittot <vincent.guittot@...aro.org>, "Libo
Chen" <libo.chen@...cle.com>, Abel Wu <wuyun.abel@...edance.com>, Len Brown
<len.brown@...el.com>, <linux-kernel@...r.kernel.org>, K Prateek Nayak
<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
"Zhao Liu" <zhao1.liu@...el.com>, Vinicius Costa Gomes
<vinicius.gomes@...el.com>, Arjan Van De Ven <arjan.van.de.ven@...el.com>
Subject: Re: [PATCH v3 1/2] sched: Create architecture specific sched domain
distances
On 9/12/2025 2:30 AM, Tim Chen wrote:
> Allow architecture specific sched domain NUMA distances that can be
> modified from NUMA node distances for the purpose of building NUMA
> sched domains.
>
> The actual NUMA distances are kept separately. This allows for NUMA
> domain levels modification when building sched domains for specific
> architectures.
>
> Consolidate the recording of unique NUMA distances in an array to
> sched_record_numa_dist() so the function can be reused to record NUMA
> distances when the NUMA distance metric is changed.
>
> No functional change if there's no arch specific NUMA distances
> are being defined.
>
[snip]
> +
> +void sched_init_numa(int offline_node)
> +{
> + struct sched_domain_topology_level *tl;
> + int nr_levels, nr_node_levels;
> + int i, j;
> + int *distances, *domain_distances;
> + struct cpumask ***masks;
> +
> + if (sched_record_numa_dist(offline_node, numa_node_dist, &distances,
> + &nr_node_levels))
> + return;
> +
> + WRITE_ONCE(sched_avg_remote_numa_distance,
> + avg_remote_numa_distance(offline_node));
> +
> + if (sched_record_numa_dist(offline_node,
> + arch_sched_node_distance, &domain_distances,
> + &nr_levels)) {
> + kfree(distances);
> + return;
> + }
> + rcu_assign_pointer(sched_numa_node_distance, distances);
> + WRITE_ONCE(sched_max_numa_distance, distances[nr_node_levels - 1]);
[snip]
> @@ -2022,7 +2097,6 @@ void sched_init_numa(int offline_node)
> sched_domain_topology = tl;
>
> sched_domains_numa_levels = nr_levels;
> - WRITE_ONCE(sched_max_numa_distance, sched_domains_numa_distance[nr_levels - 1]);
>
Before this patch, sched_max_numa_distance is assigned a valid
value at the end of sched_init_numa(), after sched_domains_numa_masks
and sched_domain_topology_level are successfully created or appended
, the kzalloc() call should succeed.
Now we assign sched_max_numa_distance earlier, without considering
the status of NUMA sched domains. I think this is intended, because
sched domains are only for generic load balancing, while
sched_max_numa_distance is for NUMA load balancing; in theory, they
use different metrics in their strategies. Thus, this change should
not cause any issues.
From my understanding,
Reviewed-by: Chen Yu <yu.c.chen@...el.com>
thanks,
Chenyu
Powered by blists - more mailing lists