[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1d03d2c-9f5d-4fcf-91ba-dfe2c5907292@intel.com>
Date: Thu, 14 Aug 2025 16:57:47 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring
Hi Tony,
On 8/11/25 11:16 AM, Tony Luck wrote:
> Background
> ----------
> On Intel systems that support per-RMID telemetry monitoring each logical
> processor keeps a local count for various events. When the
> IA32_PQR_ASSOC.RMID value for the logical processor changes (or when a
> two millisecond counter expires) these event counts are transmitted to
> an event aggregator on the same package as the processor together with
> the current RMID value. The event counters are reset to zero to begin
> counting again.
>
> Each aggregator takes the incoming event counts and adds them to
> cumulative counts for each event for each RMID. Note that there can be
> multiple aggregators on each package with no architectural association
> between logical processors and an aggregator.
>
> All of these aggregated counters can be read by an operating system from
> the MMIO space of the Out Of Band Management Service Module (OOBMSM)
> device(s) on a system. Any counter can be read from any logical processor.
>
> Intel publishes details for each processor generation showing which
> events are counted by each logical processor and the offsets for each
> accumulated counter value within the MMIO space in XML files here:
> https://github.com/intel/Intel-PMT.
>
> For example there are two energy related telemetry events for the
> Clearwater Forest family of processors and the MMIO space looks like this:
>
> Offset RMID Event
> ------ ---- -----
> 0x0000 0 core_energy
> 0x0008 0 activity
> 0x0010 1 core_energy
> 0x0018 1 activity
> ...
> 0x23F0 575 core_energy
> 0x23F8 575 activity
>
> In addition the XML file provides the units (Joules for core_energy,
> Farads for activity) and the type of data (fixed-point binary with
> bit 63 used to indicate the data is valid, and the low 18 bits as a
> binary fraction).
>
> Finally, each XML file provides a 32-bit unique id (or guid) that is
> used as an index to find the correct XML description file for each
> telemetry implementation.
>
> The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
> to enumerate the aggregator instances (also referred to as "telemetry
> regions" in this series) on a platform. It provides:
>
> 1) guid - so resctrl can determine which events are supported
> 2) MMIO base address of counters
> 3) package id
>
> Resctrl accumulates counts from all aggregators on a package in order
> to provide a consistent user interface across processor generations.
>
> Directory structure for the telemetry events looks like this:
>
> $ tree /sys/fs/resctrl/mon_data/
> /sys/fs/resctrl/mon_data/
> mon_data
> ├── mon_PERF_PKG_00
> │ ├── activity
> │ └── core_energy
> └── mon_PERF_PKG_01
> ├── activity
> └── core_energy
>
> Reading the "core_energy" file from some resctrl mon_data directory shows
> the cumulative energy (in Joules) used by all tasks that ran with the RMID
> associated with that directory on a given package. Note that "core_energy"
> reports only energy consumed by CPU cores (data processing units,
> L1/L2 caches, etc.). It does not include energy used in the "uncore"
> (L3 cache, on package devices, etc.), or used by memory or I/O devices.
>
>
I think this series is close to being ready to pass to the x86 maintainers.
To that end I focused a lot on the changelogs with the goal to meet the
tip requirements that mostly involved switching to imperative tone. I do not
expect that I found all the cases though (and I may also have made some mistakes
in my suggested text!) so please ensure the changelogs are in imperative tone
and uses consistent terms throughout the series.
If you disagree with any feedback or if any of the feedback is unclear please
let us discuss before you spin a new version. Of course it is not required
that you follow all feedback but if you don't I would like to learn why.
Please note that I will be offline next week.
Reinette
Powered by blists - more mailing lists