lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDeCQ-v9OHzHauPi@agluck-desk3>
Date: Wed, 28 May 2025 14:38:11 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: Fenghua Yu <fenghuay@...dia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
	Peter Newman <peternewman@...gle.com>,
	James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>,
	Drew Fustini <dfustini@...libre.com>,
	Dave Martin <Dave.Martin@....com>,
	Anil Keshavamurthy <anil.s.keshavamurthy@...el.com>,
	Chen Yu <yu.c.chen@...el.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v5 00/29] x86/resctrl telemetry monitoring

Hi Reinette,

I've begun drafting a new cover letter to explain telemetry.

Here's the introduction. Let me know if it helps cover the
gaps and ambiguities that you pointed out.

-Tony


RMID based telemetry events
---------------------------

Each CPU on a system keeps a local count of various events.

Every two milliseconds, or when the value of the RMID field in the
IA32_PQR_ASSOC MSR is changed, the CPU transmits all the event counts
together with the value of the RMID to a nearby OOBMSM (Out of band
management services module) device. The CPU then resets all counters and
begins counting events for the new RMID or time interval.

The OOBMSM device sums each event count with those received from other
CPUs keeping a running total for each event for each RMID.

The operating system can read these counts to gather a picture of
system-wide activity for each of the logged events per-RMID.

E.g. the operating system may assign RMID 5 to all the tasks running to
perform a certain job. When it reads the core energy event counter for
RMID 5 it will see the total energy consumed by CPU cores for all tasks
in that job while running on any CPU. This is a much lower overhead
mechanism to track events per job than the typical "perf" approach
of reading counters on every context switch.

Events
------

"core energy" The number of Joules consumed by CPU cores during execution
of instructions for the current RMID.
Note that this does not include energy used by the "uncore" (LLC cache
and interfaces to off package devices) or energy used by memory or I/O
devices. Energy may be calculated based on measures of activity rather
than the output from a power meter.

"activity" The dynamic capacitance (Cdyn) in Farads for a core due to
execution of instructions for the current RMID. This event will be
more useful to a user interested in optimizing energy consumption
of a workload because it is invariant of frequency changes (e.g.
turbo mode) that may be outside of the control of the developer.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ