[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5078f930-e56e-45b5-9df3-99e88c0858dd@intel.com>
Date: Thu, 30 Nov 2023 15:40:52 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>
CC: Fam Zheng <fam@...hon.net>, Fenghua Yu <fenghua.yu@...el.com>,
"Peter Newman" <peternewman@...gle.com>,
Jonathan Corbet <corbet@....net>,
Shuah Khan <skhan@...uxfoundation.org>, <x86@...nel.org>,
Shaopeng Tan <tan.shaopeng@...itsu.com>,
James Morse <james.morse@....com>,
Jamie Iles <quic_jiles@...cinc.com>,
Babu Moger <babu.moger@....com>,
Randy Dunlap <rdunlap@...radead.org>,
<linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<patches@...ts.linux.dev>,
Shaopeng Tan <tan.shaopeng@...fujitsu.com>
Subject: Re: [PATCH v12 7/8] x86/resctrl: Sub NUMA Cluster detection and
enable
Hi Tony,
On 11/30/2023 2:43 PM, Tony Luck wrote:
> On Thu, Nov 30, 2023 at 01:47:10PM -0800, Reinette Chatre wrote:
...
>>> if (!x86_match_cpu(snc_cpu_ids))
>>> return 1;
>>
>> I understand and welcome this change as motivated by robustness. Apart
>> from that, with this being a model specific feature for this particular
>> group of systems, it it not clear to me in which scenarios this could
>> run on a system where a present CPU does not have access to L3 cache.
>
> Agreed that on these systems there should always be an L3 cache. Should
> I drop the check for "-1"?
Please do keep it. I welcome the additional robustness. The static checker I
tried did not complain about this but I expect that it is something that
could trigger checks.
>
>>>
>>> - node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
>>> + node_caches = bitmap_zalloc(num_online_cpus(), GFP_KERNEL);
>>
>> Please do take care to take new bitmap size into account in all
>> places. From what I can tell there is a later bitmap_weight() call that
>> still uses nr_node_ids as size.
>
> Oops. I was also using num_online_cpus() before cpus_read_lock(), so
> things could theoretically change before the bitmap_weight() call.
> I switched to using num_present_cpus() in both places.
Thanks for catching this. I am not sure if num_present_cpus() is the right
choice. I found its comment to say "If HOTPLUG is enabled, then cpu_present_mask
varies dynamically ...". num_possible_cpus() seems more appropriate when looking
for something that does not change while not holding the hotplug lock. Reading its
description more closely also makes me wonder if the later
num_online_cpus() != num_present_cpus()
should also maybe be
num_online_cpus() != num_possible_cpus() ?
It seems to more closely match the intention.
>>> if (!node_caches)
>>> return 1;
>>>
>>> @@ -1072,10 +1073,13 @@ static __init int snc_get_config(void)
>>>
>>> for_each_node(node) {
>>> cpu = cpumask_first(cpumask_of_node(node));
>>> - if (cpu < nr_cpu_ids)
>>> - set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
>>> - else
>>> + if (cpu < nr_cpu_ids) {
>>> + cache_id = get_cpu_cacheinfo_id(cpu, 3);
>>> + if (cache_id != -1)
>>> + set_bit(cache_id, node_caches);
>>> + } else {
>>> mem_only_nodes++;
>>> + }
>>> }
>>> cpus_read_unlock();
>>>
>>
>> Could this code be made even more robust by checking the computed
>> snc_nodes_per_l3_cache against the limited actually possible values?
>> Forcing it to 1 if something went wrong?
>
> Added a couple of extra sanity checks. See updated incremental patch
> below.
Thank you very much. The additional checks look good to me.
Reinette
Powered by blists - more mailing lists