lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7deaa25a-ecd6-478c-91f4-273aa1f049d2@intel.com>
Date: Wed, 12 Nov 2025 20:11:58 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
 Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v13 16/32] x86/resctrl: Discover hardware telemetry events

Hi Tony,

On 10/29/25 9:20 AM, Tony Luck wrote:
> Each CPU collects data for telemetry events that it sends to the nearest
> telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID
> changes, or when a two millisecond timer expires.
> 
> There is a guid and an MMIO region associated with each aggregator. The

Could this be:
	There is a feature type ("energy" or "perf"), guid, and MMIO region
	associated with each aggregator. This combination links to an XML ...

> combination of the guid and the size of the MMIO region link to an XML description
> of the set of telemetry events tracked by the aggregator. XML files are published
> by Intel in a GitHub repository [1].
> 
> The telemetry event aggregators maintain per-RMID per-event counts of the
> total seen for all the CPUs. There may be multiple telemetry
> event aggregators per package.
> 
> There are separate sets of aggregators for each type of event, but all

"type of event" -> "feature type, for example "perf" or "energy". All ..."

Would this be accurate:
	There are separate sets of aggregators for each feature type. Aggregators
	in a set may have different guids. All aggregators with the same feature
	type and guid are symmetric ...

> aggregators with the same guid are symmetric keeping counts for the same
> set of events for the CPUs that provide data to them.
> 
> The XML file for each aggregator provides the following information:

Would this be accurate? 

 "0) Feature type of the events ("perf" or "energy")"?

> 1) Which telemetry events are included in the group.

"included in the group" -> "tracked by the aggregator"?

> 2) The order in which the event counters appear for each RMID.
> 3) The value type of each event counter (integer or fixed-point).
> 4) The number of RMIDs supported.
> 5) Which additional aggregator status registers are included.
> 6) The total size of the MMIO region for an aggregator.
> 
> The resctrl implementation condenses the relevant information from the
> XML file into some of the fields of struct event_group.

(above implies struct event_group already exists)

How about:
"Introduce struct event_group that condenses the relevant information
from an XML file. Hereafter an "event group" refers to a group of events of
a particular feature type ("energy" or "perf") with a particular guid."

(Above also tries to give definition to "event group" mentioned in v12, please
do correct and feel free to improve)
> The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events.  This
> driver provides intel_pmt_get_regions_by_feature() to list all available telemetry
> event aggregators of a given enum pmt_feature_id type. The list includes the
(at this point it should be clear what "feature type" means).
"enum pmt_feature_id type" -> "feature type"

> "guid", the base address in MMIO space for the region where the event counters
> are exposed, and the package id where the all the CPUs that report to this
> aggregator are located.
> 

The example below describes behavior of implementation in a particular scenario before introducing
the implementation self. I think the general changelog can help to explain that such a scenario is
possible and I tried to do so with the earlier sample text. If you feel that this example is
still helpful then I would propose it be placed at the end of the changelog, after general
description of implementation, with a heading that reader can use to decide whether to read or
skip what follows. Placing it in maintainer notes is also an option.
 

> A theoretical example struct pmt_feature_group returned from the INTEL_PMT_TELEMETRY
> driver for events of type FEATURE_PER_RMID_PERF_TELEM could look like this:
> 
>  +-------------------------------+
>  | count = 6                     |
>  +-------------------------------+
>  | [0] guid_1 size_1 pkg1 addr_1 |
>  +-------------------------------+
>  | [1] guid_1 size_1 pkg2 addr_2 |
>  +-------------------------------+
>  | [2] guid_2 size_2 pkg1 addr_3 |
>  +-------------------------------+
>  | [3] guid_2 size_2 pkg1 addr_4 |
>  +-------------------------------+
>  | [4] guid_2 size_2 pkg2 addr_5 |
>  +-------------------------------+
>  | [5] guid_2 size_2 pkg2 addr_6 |
>  +-------------------------------+
> 
> This provides details for "perf" aggregators with two guids. If resctrl
> has an event_group for both of these guids it will get two copies of this
> struct pmt_feature_group by calling intel_pmt_get_regions_by_feature()
> once for each. event_group::pfg will point to the copy acquired from
> each call.
> 
> On the call for guid1 it will see there is just one aggregator per package for
> guid_1. So resctrl can read event counts from the MMIO addr_1 on package 1 and
> addr_2 on package 2.
> 
> There are two aggregators listed on each package for guid_2. So resctrl must
> read counters from addr_3 and addr_4 and sum them to provide result for package" t
> 1. Similarly addr_5 and addr_6 must be read and summed for event counts on
> package 2.
> 
> resctrl will silently ignore unknown guid values.
> 
> Add a new Kconfig option CONFIG_X86_CPU_RESCTRL_INTEL_AET for the Intel specific
> parts of telemetry code. This depends on the INTEL_PMT_TELEMETRY and INTEL_TPMI
> drivers being built-in to the kernel for enumeration of telemetry features.
> 

The paragraph below seems better suited to follow right after the "The INTEL_PMT_TELEMETRY
driver enumerates support ..." paragraph.

> Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each guid
> known to resctrl (using the appropriate enum pmt_feature_id argument for that
> guid) to obtain a private copy of struct pmt_feature_group that contains all

"each guid known to resctrl (using the appropriate enum pmt_feature_id argument for that
guid)" -> "each event group"?

> discovered/enumerated telemetry aggregator data for all event groups (known and
> unknown to resctrl) of that pmt_feature_id. Further processing on this structure
> will enable all supported events in resctrl.

I think the above tries to be too specific in what the code does while also trying to
explain the flow at a high level. I find the result difficult to parse. Consider something like:

	Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event group
	to obtain a private copy of that event group's aggregator data. Duplicate the aggregator
	data between event groups that have the same feature type but different guid. Further
	processing on this private copy will be unique to the event group.

	Return the aggregator data to INTEL_PMT_TELEMETRY at resctrl exit time.

  > 
> Return the struct pmt_feature_group to INTEL_PMT_TELEMETRY at resctrl exit time.
> 
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> Link: https://github.com/intel/Intel-PMT # [1]
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h  |   8 ++
>  arch/x86/kernel/cpu/resctrl/core.c      |   5 +
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 122 ++++++++++++++++++++++++
>  arch/x86/Kconfig                        |  13 +++
>  arch/x86/kernel/cpu/resctrl/Makefile    |   1 +
>  5 files changed, 149 insertions(+)
>  create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 14fadcff0d2b..886261a82b81 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -217,4 +217,12 @@ void __init intel_rdt_mbm_apply_quirk(void);
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
>  
> +#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
> +bool intel_aet_get_events(void);
> +void __exit intel_aet_exit(void);
> +#else
> +static inline bool intel_aet_get_events(void) { return false; }
> +static inline void __exit intel_aet_exit(void) { }
> +#endif
> +
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index af555dadf024..648f44cff52c 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -738,6 +738,9 @@ void resctrl_arch_pre_mount(void)
>  
>  	if (!atomic_try_cmpxchg(&only_once, &old, 1))
>  		return;
> +
> +	if (!intel_aet_get_events())
> +		return;
>  }
>  
>  enum {
> @@ -1095,6 +1098,8 @@ late_initcall(resctrl_arch_late_init);
>  
>  static void __exit resctrl_arch_exit(void)
>  {
> +	intel_aet_exit();
> +
>  	cpuhp_remove_state(rdt_online);
>  
>  	resctrl_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> new file mode 100644
> index 000000000000..02bbe7872fcf
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -0,0 +1,122 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Resource Director Technology(RDT)
> + * - Intel Application Energy Telemetry
> + *
> + * Copyright (C) 2025 Intel Corporation
> + *
> + * Author:
> + *    Tony Luck <tony.luck@...el.com>
> + */
> +
> +#define pr_fmt(fmt)   "resctrl: " fmt
> +
> +#include <linux/array_size.h>
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/intel_pmt_features.h>
> +#include <linux/intel_vsec.h>
> +#include <linux/overflow.h>
> +#include <linux/resctrl.h>
> +#include <linux/stddef.h>
> +#include <linux/types.h>
> +
> +#include "internal.h"
> +
> +/**
> + * struct event_group - All information about a group of telemetry events.

Trying to answer my own question from v12. How about:
"Events with the same feature type ("energy" or "perf") and guid"


> + * @feature:		Argument to intel_pmt_get_regions_by_feature() to
> + *			discover if this event_group is supported.

"discover if this event_group is supported" - this does not seem accurate (and
contradicts the @pfg description)
How about just:
	"Type of events, for example FEATURE_PER_RMID_PERF_TELEM or FEATURE_PER_RMID_ENERGY_TELEM, in this group."


> + * @pfg:		Points to the aggregated telemetry space information
> + *			returned by the intel_pmt_get_regions_by_feature()
> + *			call to the INTEL_PMT_TELEMETRY driver that contains
> + *			data for all telemetry regions of a specific type.

"of a specific type" -> "of type @feature"

> + *			Valid if the system supports the event group.
> + *			NULL otherwise.
> + * @guid:		Unique number per XML description file.
> + */
> +struct event_group {
> +	/* Data fields for additional structures to manage this group. */
> +	enum pmt_feature_id		feature;
> +	struct pmt_feature_group	*pfg;
> +
> +	/* Remaining fields initialized from XML file. */
> +	u32				guid;
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
> + */
> +static struct event_group energy_0x26696143 = {
> +	.feature	= FEATURE_PER_RMID_ENERGY_TELEM,
> +	.guid		= 0x26696143,
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
> + */
> +static struct event_group perf_0x26557651 = {
> +	.feature	= FEATURE_PER_RMID_PERF_TELEM,
> +	.guid		= 0x26557651,
> +};
> +
> +static struct event_group *known_event_groups[] = {
> +	&energy_0x26696143,
> +	&perf_0x26557651,
> +};

Placing all event groups in same array is a great enhancement.

> +
> +#define for_each_event_group(_peg)						\
> +	for (_peg = known_event_groups;						\
> +	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
> +	     _peg++)
> +
> +/* Stub for now */
> +static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> +	return false;
> +}
> +
> +/*
> + * Make a request to the INTEL_PMT_TELEMETRY driver for a copy of the
> + * pmt_feature_group for each known feature. If there is one, the returned
> + * structure has an array of telemetry_region structures, each element of
> + * the array describes one telemetry aggregator.
> + * A single pmt_feature_group may include multiple different guids.
> + * Try to use every telemetry aggregator with a known guid.

Same feedback as v12.

> + */
> +bool intel_aet_get_events(void)
> +{

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ