[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1d6606fd-5047-4286-ac69-0dfe4de1b844@intel.com>
Date: Wed, 28 May 2025 15:21:42 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Fenghua Yu <fenghuay@...dia.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
Fustini" <dfustini@...libre.com>, Dave Martin <Dave.Martin@....com>, "Anil
Keshavamurthy" <anil.s.keshavamurthy@...el.com>, Chen Yu
<yu.c.chen@...el.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v5 00/29] x86/resctrl telemetry monitoring
Hi Tony,
On 5/28/25 2:38 PM, Luck, Tony wrote:
> Hi Reinette,
>
> I've begun drafting a new cover letter to explain telemetry.
>
> Here's the introduction. Let me know if it helps cover the
> gaps and ambiguities that you pointed out.
>
> -Tony
>
>
> RMID based telemetry events
> ---------------------------
>
> Each CPU on a system keeps a local count of various events.
>
> Every two milliseconds, or when the value of the RMID field in the
> IA32_PQR_ASSOC MSR is changed, the CPU transmits all the event counts
> together with the value of the RMID to a nearby OOBMSM (Out of band
> management services module) device. The CPU then resets all counters and
> begins counting events for the new RMID or time interval.
>
> The OOBMSM device sums each event count with those received from other
> CPUs keeping a running total for each event for each RMID.
>
> The operating system can read these counts to gather a picture of
> system-wide activity for each of the logged events per-RMID.
>
> E.g. the operating system may assign RMID 5 to all the tasks running to
> perform a certain job. When it reads the core energy event counter for
> RMID 5 it will see the total energy consumed by CPU cores for all tasks
> in that job while running on any CPU. This is a much lower overhead
> mechanism to track events per job than the typical "perf" approach
> of reading counters on every context switch.
>
Could you please elaborate the CPU vs core distinction?
If the example above is for a system with below topology (copied from
Documentation/arch/x86/topology.rst):
[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0
-> [thread 1] -> Linux CPU 1
-> [core 1] -> [thread 0] -> Linux CPU 2
-> [thread 1] -> Linux CPU 3
In the example, RMID 5 is assigned to tasks running "a certain job", for
convenience I will name it "jobA". Consider if the example is extended
with RMID 6 assigned to tasks running another job, "jobB".
If a jobA task is scheduled on CPU 0 and a jobB task is scheduled in CPU 1
then it may look like:
[package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 #RMID 5
-> [thread 1] -> Linux CPU 1 #RMID 6
-> [core 1] -> [thread 0] -> Linux CPU 2
-> [thread 1] -> Linux CPU 3
The example above states:
When it reads the core energy event counter for RMID 5 it will
see the total energy consumed by CPU cores for all tasks in that
job while running on any CPU.
With RMID 5 and RMID 6 both running on core 0, and "RMID 5 will see
the total energy consumed by CPU cores", does this mean that reading RMID 5
counter will return the energy consumed by core 0 while RMID 5 is assigned to
CPU 0? Since core 0 contains both CPU 0 and CPU 1, would reading RMID 5 thus return
data of both RMID 5 and RMID 6 (jobA and jobB)?
And vice versa, reading RMID 6 will also include energy consumed by tasks
running with RMID 5?
Reinette
Powered by blists - more mailing lists