[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a7cc72a-99fb-4862-b7eb-da3d515f0453@intel.com>
Date: Thu, 23 May 2024 10:03:29 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>
CC: Fenghua Yu <fenghua.yu@...el.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>,
<x86@...nel.org>, <linux-kernel@...r.kernel.org>, <patches@...ts.linux.dev>
Subject: Re: [PATCH v18 15/17] x86/resctrl: Fix RMID reading sanity check for
Sub-NUMA (SNC) mode
Hi Tony,
On 5/22/24 4:47 PM, Tony Luck wrote:
> On Wed, May 22, 2024 at 02:25:23PM -0700, Reinette Chatre wrote:
>>> + /*
>>> + * SNC: OK to read events on any CPU sharing same L3
>>> + * cache instance.
>>> + */
>>> + if (d->display_id != get_cpu_cacheinfo_id(smp_processor_id(),
>>> + r->mon_display_scope))
>>
>> By hardcoding that mon_display_scope is a cache instead of using get_domain_id_from_scope()
>> it seems that all pretending about being generic has just been abandoned at this point.
>
> Yes. It now seems like a futile quest to make this look
> like something generic. All this code is operating on the
I did not see the generic solution as not being possible. The
implementation seemed generally ok to me when thinking of it as a
generic solution with implementation that has the optimization of not
sending IPIs unnecessarily since the only user is SNC.
> rdt_resources_all[RDT_RESOURCE_L3] resource (which by its very name is
Yes, good point.
> "L3" scoped). In the SNC case the L3 has been divided (in some senses,
> but not all) into nodes.
>
> Given that pretending isn't working ... just be explicit?
>
> Some "thinking aloud" follows ...
Sure, will consider with you ...
>
> struct rdt_resource:
> In order to track monitor events, resctrl must build a domain list based
> on the smallest measurement scope. So with SNC enabled, that is the
> node. With it disabled it is L3 cache scope (which on existing systems
> is the same as node scope).
>
> Maybe keep .mon_scope with the existing name, but define it to be the
> minimum measurement scope and use it to build domains. So it
> defaults to RESCTRL_L3_CACHE but SNC detection will rewrite it to
> RESCTRL_L3_NODE.
Above has been agreed on for a while now, no? The only change is that
the name of the new scope will change from RESCTRL_NODE to RESCTRL_L3_NODE?
>
> Drop the .mon_display_scope field. By definition it must always have
> the value RESCTRL_L3_CACHE. So replace checks that compare values
> rdt_resources_all[RDT_RESOURCE_L3] of .mon_scope & .mon_display_scope
> with:
>
> if (r->mon_scope != RESCTRL_L3_CACHE)
> // SNC stuff
> else
> // regular stuff
This seems reasonable considering what you reminded about earlier that
all things monitoring is hardcoded to RDT_RESOURCE_L3. Perhaps that test
can be a macro with an elaborate comment describing the SNC view of the
world? I also think that a specific test may be easier to understand
("if (r->mon_scope == RESCTRL_L3_NODE) /* SNC */") since that makes it
easier to follow code to figure out where RESCTRL_L3_NODE is assigned as
opposed to trying to find flows where mon_scope is _not_ RESCTRL_L3_CACHE.
> struct rdt_mon_domain:
> In the rdt_mon_domain rename the display_id field with the more
> honest name "l3_cache_id". In addition save a pointer to the
> .shared_cpu_map of the L3 cache. When SNC is off, this will be the
Sounds good. If already saving a pointer, could that be simplified,
while also making code easier to understand, with a pointer to the
cache's struct cacheinfo instead? That will give access to cache ID as
well as shared_cpu_map.
> same as the d->hdr.cpu_mask for the domain. For SNC on it will be
> a superset (encompassing all the bits from cpu_masks in all domains
> that share an L3 instance).
May need to take care when considering scenarios where CPUs can be
offlined. For example, when SNC is enabled and all CPUs associated with
all but one NUMA domain are disabled then the final remaining monitoring
domain may have the same CPU mask as the L3 cache even though SNC is
enabled?
> Where SNC specifc code is required, the check becomes:
>
> if (d->hdr.id != d->l3_cache_id)
> // SNC stuff
> else
> // regular stuff
I am not sure about these tests and will need more context on where they
will be used. For example, when SNC is enabled then NUMA node #0 belongs
to cache ID #0 then the test would not capture that SNC is enabled for
monitoring domain #0?
> The l3_cache_id can be used in mkdir code to make the mon_L3_XX
> directories. The L3 .shared_cpu_map in picking a CPU to read
> the counters for the "sum" files. l3_cache_id also indicates
> which domains should be summed.
Using the L3 .shared_cpu_map to pick CPU sounds good. It really makes
it obvious what is going on.
> Does this look like a useful direction to pursue?
As I understand it will make the code obviously specific to SNC but not
change the flow of implementation in this series. I do continue to
believe that many of the flows to support SNC are not intuitive (to me)
so I would like to keep my request that the SNC portions have clear
comments to explain why it does the things it does and not just leave
the reader with impression of "if (SNC specific check) /* quirks */ ".
This will help future changes to these areas.
Reinette
Powered by blists - more mailing lists