lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <ZWkqv+egQuph03Bm@agluck-desk3> Date: Thu, 30 Nov 2023 16:37:19 -0800 From: Tony Luck <tony.luck@...el.com> To: Reinette Chatre <reinette.chatre@...el.com> Cc: Fam Zheng <fam@...hon.net>, Fenghua Yu <fenghua.yu@...el.com>, Peter Newman <peternewman@...gle.com>, Jonathan Corbet <corbet@....net>, Shuah Khan <skhan@...uxfoundation.org>, x86@...nel.org, Shaopeng Tan <tan.shaopeng@...itsu.com>, James Morse <james.morse@....com>, Jamie Iles <quic_jiles@...cinc.com>, Babu Moger <babu.moger@....com>, Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, patches@...ts.linux.dev, Shaopeng Tan <tan.shaopeng@...fujitsu.com> Subject: Re: [PATCH v12 7/8] x86/resctrl: Sub NUMA Cluster detection and enable On Thu, Nov 30, 2023 at 03:40:52PM -0800, Reinette Chatre wrote: > Hi Tony, > > On 11/30/2023 2:43 PM, Tony Luck wrote: > > On Thu, Nov 30, 2023 at 01:47:10PM -0800, Reinette Chatre wrote: > > ... > > >>> if (!x86_match_cpu(snc_cpu_ids)) > >>> return 1; > >> > >> I understand and welcome this change as motivated by robustness. Apart > >> from that, with this being a model specific feature for this particular > >> group of systems, it it not clear to me in which scenarios this could > >> run on a system where a present CPU does not have access to L3 cache. > > > > Agreed that on these systems there should always be an L3 cache. Should > > I drop the check for "-1"? > > Please do keep it. I welcome the additional robustness. The static checker I > tried did not complain about this but I expect that it is something that > could trigger checks. > > > > >>> > >>> - node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL); > >>> + node_caches = bitmap_zalloc(num_online_cpus(), GFP_KERNEL); > >> > >> Please do take care to take new bitmap size into account in all > >> places. From what I can tell there is a later bitmap_weight() call that > >> still uses nr_node_ids as size. > > > > Oops. I was also using num_online_cpus() before cpus_read_lock(), so > > things could theoretically change before the bitmap_weight() call. > > I switched to using num_present_cpus() in both places. > > Thanks for catching this. I am not sure if num_present_cpus() is the right > choice. I found its comment to say "If HOTPLUG is enabled, then cpu_present_mask > varies dynamically ...". num_possible_cpus() seems more appropriate when looking I can size the bitmask based on num_possible_cpus(). > for something that does not change while not holding the hotplug lock. Reading its > description more closely also makes me wonder if the later > num_online_cpus() != num_present_cpus() > should also maybe be > num_online_cpus() != num_possible_cpus() ? > It seems to more closely match the intention. This seems problematic. On a system that does support physical CPU hotplug num_possible_cpus() may be some very large number. Reserving space for CPUs that can be added later. None of those CPUs can be online (obviously!). So this test would fail on such a system. > >>> if (!node_caches) > >>> return 1; > >>> > >>> @@ -1072,10 +1073,13 @@ static __init int snc_get_config(void) > >>> > >>> for_each_node(node) { > >>> cpu = cpumask_first(cpumask_of_node(node)); > >>> - if (cpu < nr_cpu_ids) > >>> - set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches); > >>> - else > >>> + if (cpu < nr_cpu_ids) { > >>> + cache_id = get_cpu_cacheinfo_id(cpu, 3); > >>> + if (cache_id != -1) > >>> + set_bit(cache_id, node_caches); > >>> + } else { > >>> mem_only_nodes++; > >>> + } > >>> } > >>> cpus_read_unlock(); > >>> > >> > >> Could this code be made even more robust by checking the computed > >> snc_nodes_per_l3_cache against the limited actually possible values? > >> Forcing it to 1 if something went wrong? > > > > Added a couple of extra sanity checks. See updated incremental patch > > below. > > Thank you very much. The additional checks look good to me. > > Reinette Thanks for looking at this. I'm applying changes to my local tree. I'll give folks a little more time to find additonal issues in v12 and post v13 next week. -Tony
Powered by blists - more mailing lists