[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fad9fc6f-98b2-4ea6-aa47-ddb26e66f5b0@intel.com>
Date: Wed, 22 Oct 2025 21:28:05 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v12 14/31] x86/resctrl: Discover hardware telemetry events
Hi Tony,
On 10/13/25 3:33 PM, Tony Luck wrote:
> Each CPU collects data for telemetry events that it sends to the nearest
> telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID
> changes, or when a two millisecond timer expires.
>
> The telemetry event aggregators maintain per-RMID per-event counts of the
> total seen for all the CPUs. There may be more than one set of telemetry
> event aggregators per package.
>
> There are separate sets of aggregators for each type of event, but all
> aggregators for a given type are symmetric keeping counts for the same
> set of events for the CPUs that provide data to them.
Above still seems to reflect previous implementation that only supported
one guid per type of event. If I understand correctly this implementation
supports the scenario where there may be aggregators of one type that support
different guid and thus are not symmetric.
>
> Each telemetry event aggregator is responsible for a specific group of
> events. E.g. on the Intel Clearwater Forest CPU there are two types of
Could you please define "event group"? It is first used in this patch
but I cannot seem to find definition. Would be helpful to distinguish it from
a "feature group".
> aggregators. One type tracks a pair of energy related events. The other
> type tracks a subset of "perf" type events.
>
> The event counts are made available to Linux in a region of MMIO space
> for each aggregator. All details about the layout of counters in each
> aggregator MMIO region are described in XML files published by Intel and
> made available in a GitHub repository [1].
>
> The key to matching a specific telemetry aggregator to the XML file that
> describes the MMIO layout is a 32-bit value. The Linux telemetry subsystem
> refers to this as a "guid" while the XML files call it a "uniqueid".
https://github.com/intel/Intel-PMT mentions:
Using guid and size information, the associated Intel PMT XML files
set can be identified using repository metadata at xml/pmt.xml.
>From the above quote it seems as though size should also be used as key to
match aggregator to the XML file? Also, looking at
https://github.com/intel/Intel-PMT/blob/main/xml/pmt.xml the file does use
"guid" so there does not seem to be a need to mention that XML files call it
"uniqueid".
>
> Each XML file provides the following information:
> 1) Which telemetry events are included in the group.
> 2) The order in which the event counters appear for each RMID.
> 3) The value type of each event counter (integer or fixed-point).
> 4) The number of RMIDs supported.
> 5) Which additional aggregator status registers are included.
> 6) The total size of the MMIO region for an aggregator.
>
> The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events.
> This driver provides intel_pmt_get_regions_by_feature() to list all
> available telemetry event aggregators. The list includes the "guid",
> the base address in MMIO space for the region where the event counters
> are exposed, and the package id where the all the CPUs that report to this
> aggregator are located.
>
> Add a new Kconfig option CONFIG_X86_CPU_RESCTRL_INTEL_AET for the Intel
> specific parts of telemetry code. This depends on the INTEL_PMT_TELEMETRY
> and INTEL_TPMI drivers being built-in to the kernel for enumeration of
> telemetry features.
>
> Use INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() with
> each per-RMID telemetry feature id to obtain a private copy of
"feature id" is not defined or introduced until here. Perhaps it should be
"event type" instead to remain consistent with rest of changelog?
> struct pmt_feature_group that contains all discovered/enumerated
> telemetry aggregator data for all event groups (known and unknown
> to resctrl) of that feature id. Further processing on this structure
> will enable all supported events in resctrl. Return the structure to
> INTEL_PMT_TELEMETRY at resctrl exit time.
This still documents the previous design.
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> Link: https://github.com/intel/Intel-PMT # [1]
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 8 ++
> arch/x86/kernel/cpu/resctrl/core.c | 5 +
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 140 ++++++++++++++++++++++++
> arch/x86/Kconfig | 13 +++
> arch/x86/kernel/cpu/resctrl/Makefile | 1 +
> 5 files changed, 167 insertions(+)
> create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 14fadcff0d2b..886261a82b81 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -217,4 +217,12 @@ void __init intel_rdt_mbm_apply_quirk(void);
> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
>
> +#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
> +bool intel_aet_get_events(void);
> +void __exit intel_aet_exit(void);
> +#else
> +static inline bool intel_aet_get_events(void) { return false; }
> +static inline void __exit intel_aet_exit(void) { }
> +#endif
> +
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 57a328fdde59..9451bafde923 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -738,6 +738,9 @@ void resctrl_arch_pre_mount(void)
>
> if (!atomic_try_cmpxchg(&only_once, &old, 1))
> return;
> +
> + if (!intel_aet_get_events())
> + return;
> }
>
> enum {
> @@ -1095,6 +1098,8 @@ late_initcall(resctrl_arch_late_init);
>
> static void __exit resctrl_arch_exit(void)
> {
> + intel_aet_exit();
> +
> cpuhp_remove_state(rdt_online);
>
> resctrl_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> new file mode 100644
> index 000000000000..67e479bdbc93
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -0,0 +1,140 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Resource Director Technology(RDT)
> + * - Intel Application Energy Telemetry
> + *
> + * Copyright (C) 2025 Intel Corporation
> + *
> + * Author:
> + * Tony Luck <tony.luck@...el.com>
> + */
> +
> +#define pr_fmt(fmt) "resctrl: " fmt
> +
> +#include <linux/array_size.h>
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/intel_pmt_features.h>
> +#include <linux/intel_vsec.h>
> +#include <linux/overflow.h>
> +#include <linux/resctrl.h>
> +#include <linux/stddef.h>
> +#include <linux/types.h>
> +
> +#include "internal.h"
> +
> +/**
> + * struct event_group - All information about a group of telemetry events.
Please place the definition of "event group" in this description also. At this time
it looks as though descriptions have not taken into account the design change made in
this version.
> + * @pfg: Points to the aggregated telemetry space information
> + * returned by the intel_pmt_get_regions_by_feature()
> + * call to the INTEL_PMT_TELEMETRY driver that contains
> + * data for all telemetry regions of a specific type.
> + * Valid if the system supports the event group.
> + * NULL otherwise.
> + * @guid: Unique number per XML description file.
> + */
> +struct event_group {
> + /* Data fields for additional structures to manage this group. */
> + struct pmt_feature_group *pfg;
> +
> + /* Remaining fields initialized from XML file. */
> + u32 guid;
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
> + */
> +static struct event_group energy_0x26696143 = {
> + .guid = 0x26696143,
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
> + */
> +static struct event_group perf_0x26557651 = {
> + .guid = 0x26557651,
> +};
> +
> +static struct event_group *known_energy_event_groups[] = {
> + &energy_0x26696143,
> +};
> +
> +static struct event_group *known_perf_event_groups[] = {
> + &perf_0x26557651,
> +};
> +
> +#define for_each_enabled_event_group(_peg, _grp) \
> + for (_peg = (_grp); _peg < &_grp[ARRAY_SIZE(_grp)]; _peg++) \
> + if ((*_peg)->pfg)
> +
> +/* Stub for now */
> +static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> + return false;
> +}
> +
> +/*
> + * Make a request to the INTEL_PMT_TELEMETRY driver for a copy of the
> + * pmt_feature_group for a specific feature. If there is one, the returned
> + * structure has an array of telemetry_region structures, each element of
> + * the array describes one telemetry aggregator.
Above describes the previous design?
> + * A single pmt_feature_group may include multiple different guids.
"A single pmt_feature_group may include multiple different guids." ->
"The telemetry aggregators may have different guids."?
> + * Try to use every telemetry aggregator with a known guid.
Can "Try to use" be replaced with more specific description of this implementation?
> + */
> +static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **evgs,
> + unsigned int num_evg)
> +{
> + struct pmt_feature_group *p;
> + struct event_group **peg;
> + bool ret = false;
> +
> + for (peg = evgs; peg < &evgs[num_evg]; peg++) {
> + p = intel_pmt_get_regions_by_feature(feature);
> + if (IS_ERR_OR_NULL(p))
> + return false;
This appears to assume that intel_pmt_get_regions_by_feature() can only fail if the
feature is invalid/unavailable? I do not think resctrl should make assumptions about
what could cause a failure. It is theoretically possible that one call may succeed
and the second call with the same feature ID fails. This scenario is not handled here
and will result in a leak.
> + if (enable_events(*peg, p)) {
> + (*peg)->pfg = p;
> + ret = true;
> + } else {
> + intel_pmt_put_feature_group(p);
> + }
> + }
> +
> + return ret;
> +}
> +
> +/*
> + * Ask INTEL_PMT_TELEMETRY driver for all the RMID based telemetry groups
> + * that it supports.
> + */
Reinette
Powered by blists - more mailing lists