[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZWj3NdI/qLNOgyg0@agluck-desk3>
Date: Thu, 30 Nov 2023 12:57:25 -0800
From: Tony Luck <tony.luck@...el.com>
To: Fam Zheng <fam@...hon.net>
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Peter Newman <peternewman@...gle.com>,
Jonathan Corbet <corbet@....net>,
Shuah Khan <skhan@...uxfoundation.org>, x86@...nel.org,
Shaopeng Tan <tan.shaopeng@...itsu.com>,
James Morse <james.morse@....com>,
Jamie Iles <quic_jiles@...cinc.com>,
Babu Moger <babu.moger@....com>,
Randy Dunlap <rdunlap@...radead.org>,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
patches@...ts.linux.dev, Shaopeng Tan <tan.shaopeng@...fujitsu.com>
Subject: Re: [PATCH v12 7/8] x86/resctrl: Sub NUMA Cluster detection and
enable
On Thu, Nov 30, 2023 at 06:02:42PM +0000, Fam Zheng wrote:
> > +static __init int snc_get_config(void)
> > +{
> > + unsigned long *node_caches;
> > + int mem_only_nodes = 0;
> > + int cpu, node, ret;
> > + int num_l3_caches;
> > +
> > + if (!x86_match_cpu(snc_cpu_ids))
> > + return 1;
> > +
> > + node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
> > + if (!node_caches)
> > + return 1;
> > +
> > + cpus_read_lock();
> > +
> > + if (num_online_cpus() != num_present_cpus())
> > + pr_warn("Some CPUs offline, SNC detection may be incorrect\n");
> > +
> > + for_each_node(node) {
> > + cpu = cpumask_first(cpumask_of_node(node));
> > + if (cpu < nr_cpu_ids)
> > + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
>
> Are we sure get_cpu_cacheinfo_id() is an valid index here? Looking at
> the function it could be -1 or larger than nr_node_ids.
Fam,
Return -1 is possible (in the case where first CPU on a node doesn't
have an L3 cache). Larger than nr_node_ids seems a bit more speculative.
It would mean a system with multiple L3 cache instances per node. I
suppose that's theoretically possible. In the limit case every CPU may
have its own personal L3 cache, but still have multiple CPUs grouped
together on a node.
Patch below (to be folded into part7 of next version). Increases the
size of the bitmap. Checks for get_cpu_cacheinfo_id() returning -1.
Patch just ignores the node in this case. I'm never quite sure how much
code to add for "Can't happen" scenarios.
-Tony
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 3293ab4c58b0..85f8a1b3feaf 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1056,12 +1056,13 @@ static __init int snc_get_config(void)
unsigned long *node_caches;
int mem_only_nodes = 0;
int cpu, node, ret;
+ int cache_id;
int num_l3_caches;
if (!x86_match_cpu(snc_cpu_ids))
return 1;
- node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
+ node_caches = bitmap_zalloc(num_online_cpus(), GFP_KERNEL);
if (!node_caches)
return 1;
@@ -1072,10 +1073,13 @@ static __init int snc_get_config(void)
for_each_node(node) {
cpu = cpumask_first(cpumask_of_node(node));
- if (cpu < nr_cpu_ids)
- set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
- else
+ if (cpu < nr_cpu_ids) {
+ cache_id = get_cpu_cacheinfo_id(cpu, 3);
+ if (cache_id != -1)
+ set_bit(cache_id, node_caches);
+ } else {
mem_only_nodes++;
+ }
}
cpus_read_unlock();
--
2.41.0
Powered by blists - more mailing lists