[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aecb9aaa-5b94-4787-a976-a633831596bf@intel.com>
Date: Fri, 3 Oct 2025 16:52:01 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v11 17/31] x86/resctrl: Find and enable usable telemetry
events
Hi Tony,
On 9/25/25 1:03 PM, Tony Luck wrote:
> The INTEL_PMT_TELEMETRY driver provides telemetry region structures of the
> types requested by resctrl.
>
> Scan these structures to discover which pass sanity checks to derive
> a list of valid regions:
The "to derive a list of valid regions" does not align with the
"At least one region passes the above checks" requirement. If this is about
valid (usable?) regions then I think (4) should be dropped. If this is instead about
valid events then above should be reworded to say that instead.
>
> 1) They have guid known to resctrl.
> 2) They have a valid package ID.
> 3) The enumerated size of the MMIO region matches the expected
> value from the XML description file.
> 4) At least one region passes the above checks.
>
Everything below is clear by looking at the patch. It can also be seen from patch
that enabling is done only once if there is *any* valid region instead of "for each
valid region". One thing that may be useful to add is "why" all events
can be enabled. If I understand correctly it can be something like:
Enable events that usable telemetry regions are responsible for.
> For each valid region enable all the events in the associated
> event_group::evts[].
>
> Pass a pointer to the pmt_event structure of the event within the struct
> event_group that resctrl stores in mon_evt::arch_priv. resctrl passes
> this pointer back when asking to read the event data which enables the
> data to be found in MMIO.
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 38 +++++++++++++++++++++++--
> 1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index f9b5f6cd08f8..98ba9ba05ee5 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -20,9 +20,11 @@
> #include <linux/intel_pmt_features.h>
> #include <linux/intel_vsec.h>
> #include <linux/overflow.h>
> +#include <linux/printk.h>
> #include <linux/resctrl.h>
> #include <linux/resctrl_types.h>
> #include <linux/stddef.h>
> +#include <linux/topology.h>
> #include <linux/types.h>
>
> #include "internal.h"
> @@ -114,12 +116,44 @@ static struct event_group *known_perf_event_groups[] = {
> for (_peg = (_grp); _peg < &_grp[ARRAY_SIZE(_grp)]; _peg++) \
> if ((*_peg)->pfg)
>
> -/* Stub for now */
> -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
> {
> + if (tr->guid != e->guid)
> + return true;
> + if (tr->plat_info.package_id >= topology_max_packages()) {
> + pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id,
> + tr->guid);
> + return true;
> + }
I have not encountered any mention of the possibility that packages may differ
in which telemetry region types they support. For example, could it be possible for package
A to have usable regions of the PERF type but package B doesn't? From what I can tell
INTEL_PMT_TELEMETRY supports layouts where this can be possible. If I understand correctly
this implementation will create event files for these domains but when the user attempts to
read the data it will fail. Can this work add some snippet about possibility of this
scenario and if/how it is supported?
> + if (tr->size != e->mmio_size) {
> + pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n",
> + tr->size, e->guid, e->mmio_size);
> + return true;
> + }
> +
> return false;
> }
>
> +static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> + bool usable_events = false;
> +
> + for (int i = 0; i < p->count; i++) {
> + if (skip_telem_region(&p->regions[i], e))
> + continue;
> + usable_events = true;
A previous concern [1] was why this loop does not break out at this point. I think it will
help to make this clear if marking a telemetry region as unusable (mark_telem_region_unusable())
is done in this patch. Doing so makes the "usable" and "unusable" distinction in one
patch while making clear that the loop needs to complete.
> + }
> +
> + if (!usable_events)
> + return false;
> +
> + for (int j = 0; j < e->num_events; j++)
> + resctrl_enable_mon_event(e->evts[j].id, true,
> + e->evts[j].bin_bits, &e->evts[j]);
> +
> + return true;
> +}
> +
> DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
> if (!IS_ERR_OR_NULL(_T))
> intel_pmt_put_feature_group(_T))
Reinette
[1] https://lore.kernel.org/lkml/9ac43e78-8955-db5d-61be-e08008e41f0d@linux.intel.com/
Powered by blists - more mailing lists