[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGMT35DktIVkzwjx@agluck-desk3>
Date: Mon, 30 Jun 2025 15:46:55 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: Fenghua Yu <fenghuay@...dia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>,
Drew Fustini <dfustini@...libre.com>,
Dave Martin <Dave.Martin@....com>,
Anil Keshavamurthy <anil.s.keshavamurthy@...el.com>,
Chen Yu <yu.c.chen@...el.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
On Mon, Jun 30, 2025 at 10:51:50AM -0700, Reinette Chatre wrote:
>
> Tony,
>
> On 6/26/25 9:49 AM, Tony Luck wrote:
> > Background
> > ----------
> >
> > Telemetry features are being implemented in conjunction with the
> > IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
> > counts for various events to a collector in a nearby OOBMSM device to be
> > accumulated with counts for each <RMID, event> pair received from other
> > CPUs. Cores send event counts when the RMID value changes, or after each
> > 2ms elapsed time.
>
> To start a review of this jumbo series and find that the *first* [1]
> (straight forward) request from previous review has not been addressed is
> demoralizing. I was hoping that the previous version's discussions would result
> in review feedback either addressed or discussed (never ignored). I
> cannot imagine how requesting OOBMSM to be expanded can be invalid though.
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/b8ddce03-65c0-4420-b30d-e43c54943667@intel.com/
My profound apologies for blowing it (again). I went through the comments
to patches multiple times to try and catch all your comments. But somehow
skipped the cover letter :-( .
Here's a re-write to address comments, but also to try to provide
a better story line starting with how the logical processors capture
the event data, following on with aggregator processing, etc.
-Tony
---
On Intel systems that support per-RMID telemetry monitoring each logical
processor keeps a local count for various events. When the IA32_PQR_ASSOC.RMID
value for the logical processor changes (or when a two millisecond counter
expires) these event counts are transmitted to an event aggregator on
the same package as the processor together with the current RMID value. The
event counters are reset to zero to begin counting again.
Each aggregator takes the incoming event counts and adds them to
cumulative counts for each event for each RMID. Note that there can be
multiple aggregators on each package with no architectural association
between logical processors and an aggregator.
All of these aggregated counters can be read by an operating system from
the MMIO space of the Out Of Band Management Service Module (OOBMSM)
device(s) on a system. Any counter can be read from any logical processor.
Intel publishes details for each processor generation showing which
events are counted by each logical processor and the offsets for each
accumulated counter value within the MMIO space in XML files here:
https://github.com/intel/Intel-PMT.
For example there are two energy related telemetry events for the Clearwater
Forest family of processors and the MMIO space looks like this:
Offset RMID Event
------ ---- -----
0x0000 0 core_energy
0x0008 0 activity
0x0010 1 core_energy
0x0018 1 activity
...
0x23F0 575 core_energy
0x23F8 575 activity
In addition the XML file provides the units (Joules for core_energy,
Farads for activity) and the type of data (fixed-point binary with
bit 63 used as to indicate the data is valid, and the low 18 bits as a
binary fraction).
Finally, each XML file provides a 32-bit unique id (or guid) that is
used as an index to find the correct XML description file for each
telemetry implementation.
The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
to enumerate the aggregator instances on a platform. It provides:
1) guid - so resctrl can determine which events are supported
2) mmio base address of counters
3) package id
Resctrl accumulates counts from all aggregators on a package in order
to provide a consistent user interface across processor generations.
Directory structure for the telemetry events looks like this:
$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
mon_data
├── mon_PERF_PKG_00
│ ├── activity
│ └── core_energy
└── mon_PERF_PKG_01
├── activity
└── core_energy
Reading the "core_energy" file from some resctrl mon_data directory shows
the cumulative energy (in Joules) used by all tasks that ran with the RMID
associated with that directory on a given package. Note that "core_energy"
reports only energy consumed by CPU cores (data processing units,
L1/L2 caches, etc.). It does not include energy used in the "uncore"
(L3 cache, on package devices, etc.), or used by memory or I/O devices.
Powered by blists - more mailing lists