[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f3ba783a-6387-4997-9e8c-897109ee3559@intel.com>
Date: Tue, 8 Jul 2025 13:50:39 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, "Anil
Keshavamurthy" <anil.s.keshavamurthy@...el.com>, Chen Yu
<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
Hi Tony,
On 6/30/25 3:46 PM, Luck, Tony wrote:
> On Mon, Jun 30, 2025 at 10:51:50AM -0700, Reinette Chatre wrote:
>>
>> Tony,
>>
>> On 6/26/25 9:49 AM, Tony Luck wrote:
>>> Background
>>> ----------
>>>
>>> Telemetry features are being implemented in conjunction with the
>>> IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
>>> counts for various events to a collector in a nearby OOBMSM device to be
>>> accumulated with counts for each <RMID, event> pair received from other
>>> CPUs. Cores send event counts when the RMID value changes, or after each
>>> 2ms elapsed time.
>>
>> To start a review of this jumbo series and find that the *first* [1]
>> (straight forward) request from previous review has not been addressed is
>> demoralizing. I was hoping that the previous version's discussions would result
>> in review feedback either addressed or discussed (never ignored). I
>> cannot imagine how requesting OOBMSM to be expanded can be invalid though.
>>
>> Reinette
>>
>> [1] https://lore.kernel.org/lkml/b8ddce03-65c0-4420-b30d-e43c54943667@intel.com/
>
> My profound apologies for blowing it (again). I went through the comments
> to patches multiple times to try and catch all your comments. But somehow
> skipped the cover letter :-( .
>
> Here's a re-write to address comments, but also to try to provide
> a better story line starting with how the logical processors capture
> the event data, following on with aggregator processing, etc.
>
> -Tony
>
> ---
>
> On Intel systems that support per-RMID telemetry monitoring each logical
> processor keeps a local count for various events. When the IA32_PQR_ASSOC.RMID
> value for the logical processor changes (or when a two millisecond counter
> expires) these event counts are transmitted to an event aggregator on
> the same package as the processor together with the current RMID value. The
> event counters are reset to zero to begin counting again.
>
> Each aggregator takes the incoming event counts and adds them to
> cumulative counts for each event for each RMID. Note that there can be
> multiple aggregators on each package with no architectural association
> between logical processors and an aggregator.
>
> All of these aggregated counters can be read by an operating system from
> the MMIO space of the Out Of Band Management Service Module (OOBMSM)
> device(s) on a system. Any counter can be read from any logical processor.
>
> Intel publishes details for each processor generation showing which
> events are counted by each logical processor and the offsets for each
> accumulated counter value within the MMIO space in XML files here:
> https://github.com/intel/Intel-PMT.
>
> For example there are two energy related telemetry events for the Clearwater
> Forest family of processors and the MMIO space looks like this:
>
> Offset RMID Event
> ------ ---- -----
> 0x0000 0 core_energy
> 0x0008 0 activity
> 0x0010 1 core_energy
> 0x0018 1 activity
> ...
> 0x23F0 575 core_energy
> 0x23F8 575 activity
>
> In addition the XML file provides the units (Joules for core_energy,
> Farads for activity) and the type of data (fixed-point binary with
> bit 63 used as to indicate the data is valid, and the low 18 bits as a
"bit 63 used as to indicate" -> "bit 63 used to indicate"?
> binary fraction).
>
> Finally, each XML file provides a 32-bit unique id (or guid) that is
> used as an index to find the correct XML description file for each
> telemetry implementation.
>
> The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
> to enumerate the aggregator instances on a platform. It provides:
I think it will be helpful to prime the connection between "aggregator"
and "telemetery region" here. For example,
"to enumerate the aggregator instances on a platform" -> "to enumerate
the aggregator instances (also referred to as "telemetry regions" in this series)
on a platform"
> 1) guid - so resctrl can determine which events are supported
> 2) mmio base address of counters
mmio -> MMIO
> 3) package id
>
> Resctrl accumulates counts from all aggregators on a package in order
> to provide a consistent user interface across processor generations.
>
> Directory structure for the telemetry events looks like this:
>
> $ tree /sys/fs/resctrl/mon_data/
> /sys/fs/resctrl/mon_data/
> mon_data
> ├── mon_PERF_PKG_00
> │ ├── activity
> │ └── core_energy
> └── mon_PERF_PKG_01
> ├── activity
> └── core_energy
>
> Reading the "core_energy" file from some resctrl mon_data directory shows
> the cumulative energy (in Joules) used by all tasks that ran with the RMID
> associated with that directory on a given package. Note that "core_energy"
> reports only energy consumed by CPU cores (data processing units,
> L1/L2 caches, etc.). It does not include energy used in the "uncore"
> (L3 cache, on package devices, etc.), or used by memory or I/O devices.
Thank you very much for this rework. I found this much easier to follow.
Reinette
Powered by blists - more mailing lists