[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a10e3334-3616-43ee-84bd-f26469f52051@intel.com>
Date: Wed, 22 Oct 2025 21:33:59 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v12 13/31] x86,fs/resctrl: Add and initialize rdt_resource
for package scope monitor
Hi Tony,
On 10/13/25 3:33 PM, Tony Luck wrote:
> Add a new PERF_PKG resource and introduce package level scope for
> monitoring telemetry events so that CPU hot plug notifiers can build
> domains at the package granularity.
>
> Use the physical package ID available via topology_physical_package_id()
> to identify the monitoring domains with package level scope. This enables
> user space to use:
> /sys/devices/system/cpu/cpuX/topology/physical_package_id
> to identify the monitoring domain a CPU is associated with.
>
> Now there is a new monitoring resource, add a WARN to places that
> implicitly assume RDT_RESOURCE_L3.
>
> Update some kerneldoc to point out L3 dependencies.
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> Reviewed-by: Reinette Chatre <reinette.chatre@...el.com>
I do not believe that the changes added since my RB tag fit with this patch.
As mentioned in patch #5 I find that (all but one of) the new changes added
to this patch can be merged with those stray hunks found in patch #5 to form one
logical change.
Below is a high level of what I have in mind presented as a draft changelog
for that new patch:
The feature to sum event data across multiple domains supports systems
with Sub-NUMA Cluster (SNC) mode enabled. The top-level monitoring files
in each "mon_L3_XX" directory provide the sum of data across all SNC
nodes sharing an L3 cache instance while the "mon_sub_L3_YY" sub-directories
provide the event data of the individual nodes.
SNC is only associated with the L3 resource and domains and as a result
the flow handling the sum of event data implicitly assumes it is
working with the L3 resource and domains.
Reading of telemetry events do not require to sum event data so this
feature can remain dedicated to SNC and keep the implicit assumption
of working with the L3 resource and domains.
Add a WARN to where the implicit assumption of working with the L3 resource
is made and add comments on how the structure controlling the event sum
feature is used.
> ---
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index f5189b6771a0..39bdaf45fa2a 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -92,8 +92,8 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
> * @list: Member of the global @mon_data_kn_priv_list list.
> * @rid: Resource id associated with the event file.
> * @evt: Event structure associated with the event file.
> - * @sum: Set when event must be summed across multiple
> - * domains.
> + * @sum: Set for RDT_RESOURCE_L3 when event must be summed
> + * across multiple domains.
> * @domid: When @sum is zero this is the domain to which
> * the event file belongs. When @sum is one this
> * is the id of the L3 cache that all domains to be
above can move to the new patch
...
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index ae43e09fa5e5..d681b71e6eca 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -712,6 +712,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
> if (md->sum) {
> struct rdt_l3_mon_domain *d;
>
> + if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3))
> + return -EINVAL;
> +
> /*
> * This file requires summing across all domains that share
> * the L3 cache id that was provided in the "domid" field of the
above can move to the new patch ... it can form a new hunk with the hunk from
patch #5 like:
if (md->sum) {
+ struct rdt_l3_mon_domain *d;
+
+ if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3))
+ FIXME
+
/*
* This file requires summing across all domains that share
* the L3 cache id that was provided in the "domid" field of the
I added the FIXME because this should not do a direct return here while holding
CPU hotplug lock as well as rdtgroup mutex.
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 255d4bad24cb..4c984dc6784e 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -501,6 +501,9 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
> * all domains fail for any reason.
> */
> ret = -EINVAL;
> + if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
> + return ret;
> +
This looks redundant to me when considering the WARN added to rdtgroup_mondata_show()
... and it is removed in patch #18 anyway.
> list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
> if (d->ci_id != rr->ci->id)
> continue;
...
> @@ -3028,7 +3030,8 @@ static void rmdir_all_sub(void)
> * @rid: The resource id for the event file being created.
> * @domid: The domain id for the event file being created.
> * @mevt: The type of event file being created.
> - * @do_sum: Whether SNC summing monitors are being created.
> + * @do_sum: Whether SNC summing monitors are being created. Only set
> + * when @rid == RDT_RESOURCE_L3.
> */
> static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
> struct mon_evt *mevt,
above can move to new patch
Reinette
Powered by blists - more mailing lists