linux-kernel - Re: [PATCH v5 00/29] x86/resctrl telemetry monitoring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b8ddce03-65c0-4420-b30d-e43c54943667@intel.com>
Date: Wed, 28 May 2025 10:21:01 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
 Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>, Anil Keshavamurthy <anil.s.keshavamurthy@...el.com>,
	Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v5 00/29] x86/resctrl telemetry monitoring

Hi Tony,

On 5/21/25 3:50 PM, Tony Luck wrote:
> Background
> ----------
> 
> Telemetry features are being implemented in conjunction with the
> IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
> counts for various events to a collector in a nearby OOBMSM device to be

(could you please expand what "OOBMSM" means somewhere?)

> accumulated with counts for each <RMID, event> pair received from other
> CPUs. Cores send event counts when the RMID value changes, or after each
> 2ms elapsed time.

Could you please use consistent terminology? The short paragraph above
uses "logical CPU", "CPU", and "core" seemingly interchangeably and that is
confusing since these terms mean different things on x86 (re.
Documentation/arch/x86/topology.rst). (more below)

> 
> Each OOBMSM device may implement multiple event collectors with each
> servicing a subset of the logical CPUs on a package.  In the initial

("logical CPU" ... but seems to be used in same context as "core" in
first paragraph?)

> hardware implementation, there are two categories of events: energy
> and perf.
> 
> 1) Energy - Two counters
> core_energy: This is an estimate of Joules consumed by each core. It is
> calculated based on the types of instructions executed, not from a power
> meter. This counter is useful to understand how much energy a workload
> is consuming.

With RMIDs being per logical CPU it is not obvious to me how these
events should be treated since most of them are described as core events. 

If the RMID is per logical CPU and the events are per core then how should
the counters be interpreted? Would the user, for example, need to set CPU
affinity to ensure only tasks within same monitor group are run on the same
cores? How else can it be ensured the data reflects the monitor group it
is reported for?

> activity: This measures "accumulated dynamic capacitance". Users who
> want to optimize energy consumption for a workload may use this rather
> than core_energy because it provides consistent results independent of
> any frequency or voltage changes that may occur during the runtime of
> the application (e.g. entry/exit from turbo mode).

(No scope for this event.)

> 
> 2) Performance - Seven counters
> These are similar events to those available via the Linux "perf" tool,
> but collected in a way with much lower overhead (no need to collect data
> on every context switch).
> 
> stalls_llc_hit - Counts the total number of unhalted core clock cycles
> when the core is stalled due to a demand load miss which hit in the LLC

(core scope)

> 
> c1_res - Counts the total C1 residency across all cores. The underlying
> counter increments on 100MHz clock ticks

("across all cores" ... package scope?)

> 
> unhalted_core_cycles - Counts the total number of unhalted core clock
> cycles

(core)

> 
> stalls_llc_miss - Counts the total number of unhalted core clock cycles
> when the core is stalled due to a demand load miss which missed all the
> local caches

(core)

> 
> c6_res - Counts the total C6 residency. The underlying counter increments
> on crystal clock (25MHz) ticks
> 
> unhalted_ref_cycles - Counts the total number of unhalted reference clock
> (TSC) cycles
> 
> uops_retired - Counts the total number of uops retired

(no scope in descriptions of the above)

> 
> The counters are arranged in groups in MMIO space of the OOBMSM device.
> E.g. for the energy counters the layout is:
> 
> Offset: Counter
> 0x00	core energy for RMID 0
> 0x08	core activity for RMID 0
> 0x10	core energy for RMID 1
> 0x18	core activity for RMID 1
> ...

Does seems to hint that counters/events are always per core (but the descriptions
do not always reflect that) while RMID is per logical-CPU. 

Reinette