[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BEYP281MB550957A6BA42B6371400472FBADBA@BEYP281MB5509.DEUP281.PROD.OUTLOOK.COM>
Date: Mon, 1 Dec 2025 08:55:40 +0000
From: "Ehlert, Emily" <ehemily@...zon.de>
To: Len Brown <lenb@...nel.org>, "Zhang, Rui" <rui.zhang@...el.com>
CC: "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Emily Ehlert
<ehemily@...zon.com>
Subject: Re: [PATCH 2/2] tools/power/turbostat: Fix division by zero when TDP
calculation fails
> I cut a patch to replace all the tests of platform->rapl_msrs with a
> global variable that was initialized to (read-only) platform->rapl_msrs,
> but was cleared before the test in rapl_probe_intel(), and when I run
> with --no-perf (so that turbostat must use MSRs) it seems to disable
> all the RAPL stuff cleanly.
That is very promising.
> Once upon a time we used to actually probe the RAPL msrs by trying to read them.
> If they failed to read or were zero, we would fail the probe and
> disable the couter.
> But that turned out to be problematic b/c some platforms had non-zero
> unsupported MSRs etc.
> so we moved to hard-coding the platform capabilities in a table.
I can see how this creates problems.
> In the VM, does the MSR read fail entirely, or does it just return 0
> values for unsupported MSRs?
Reading the MSR succeeds, but the read value is 0. So at least for the
Nitro Hypervisor testing for 0 would be sufficient. I am not sure how
other hypervisor solutions handle this issue, but I assume it will either
be 0 as well or not provide the /dev/cpu/*/msr file.
Here is a log for the running the Ubuntu 24.04 bundled turbostat on a
c7i.xlarge instance (Intel Saaphire Rapids CPU):
```
# turbostat -n 1 -i 1
turbostat version 2025.02.02 - Len Brown <lenb@...nel.org>
Kernel command line: BOOT_IMAGE=/vmlinuz-6.14.0-1015-aws root=PARTUUID=cd553419-794d-4da4-9ba5-c355f5f9f74d ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
CPUID(0): GenuineIntel 0x1f CPUID levels
CPUID(1): family:model:stepping 0x6:8f:8 (6:143:8) microcode 0x2b000643
CPUID(0x80000000): max_extended_levels: 0x80000008
CPUID(1): SSE3 MONITOR - - - TSC MSR - HT -
CPUID(6): APERF, TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
cpu0: MSR_IA32_MISC_ENABLE: 0x00000001 (No-TCC No-EIST No-MWAIT PREFETCH TURBO)
CPUID(7): No-SGX No-Hybrid
CPUID(0x16): base_mhz: 0 max_mhz: 0 bus_mhz: 0
cpu0: MSR_PLATFORM_INFO: 0x80080001800
8 * 100.0 = 800.0 MHz max efficiency frequency
24 * 100.0 = 2400.0 MHz base frequency
cpu0: MSR_TURBO_RATIO_LIMIT: 0x2020212123242526
cpu0: MSR_TURBO_RATIO_LIMIT1: 0x302e2c2a26221e18
32 * 100.0 = 3200.0 MHz max turbo 48 active cores
32 * 100.0 = 3200.0 MHz max turbo 46 active cores
33 * 100.0 = 3300.0 MHz max turbo 44 active cores
33 * 100.0 = 3300.0 MHz max turbo 42 active cores
35 * 100.0 = 3500.0 MHz max turbo 38 active cores
36 * 100.0 = 3600.0 MHz max turbo 34 active cores
37 * 100.0 = 3700.0 MHz max turbo 30 active cores
38 * 100.0 = 3800.0 MHz max turbo 24 active cores
cpu0: MSR_CONFIG_TDP_NOMINAL: 0x00000000 (base_ratio=0)
cpu0: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 ()
cpu0: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 ()
cpu0: MSR_CONFIG_TDP_CONTROL: 0x00000000 ( lock=0)
cpu0: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0)
NSFOD /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
cpu0: MSR_MISC_PWR_MGMT: 0x00000100 (ENable-EIST_Coordination DISable-EPB ENable-OOB)
cpu0: MSR_IA32_POWER_CTL: 0x00000000 (C1E auto-promotion: DISabled)
C-state Pre-wake: ENabled
cpu0: MSR_PKG_CST_CONFIG_CONTROL: 0x00008000 (locked, pkg-cstate-limit=0 (pc0))
/dev/cpu_dma_latency: 2000000000 usec (default)
current_driver: intel_idle
current_governor: menu
current_governor_ro: menu
cpu0: POLL: CPUIDLE CORE POLL IDLE
cpu0: C1: MWAIT 0x00
cpu0: C1E: MWAIT 0x01
cpu0: C6: MWAIT 0x20
cpu0: MSR_PKGC6_IRTL: 0x00000000 (NOTvalid, 0 ns)
RAPL: inf sec. Joule Counter Range, at 0 Watts
cpu0: MSR_RAPL_POWER_UNIT: 0x00000000 (1.000000 Watts, 1.000000 Joules, 0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x00000000 (0 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: PKG Limit #1: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled)
cpu0: PKG Limit #2: DISabled (0.000 Watts, 0.000977* sec, clamp DISabled)
cpu0: MSR_VR_CURRENT_CONFIG: 0x00000000
cpu0: PKG Limit #4: 0.000000 Watts (UNlocked)
cpu0: MSR_DRAM_POWER_INFO,: 0x00000000 (0 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_DRAM_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: DRAM Limit: DISabled (0.000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch)
Can not set timer.
Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IPC IRQ NMI SMI POLL C1 C1E C6 POLL% C1% C1E% C6% CPU%c1 CPU%c6 PKG_% RAM_%
- - 18 0.57 3202 2400 0.09 118 0 0 0 0 34 1095 0.00 0.00 0.60 99.19 0.85 98.34 373518666955243456.00 226092884916511136.00
0 0 17 0.54 3201 2400 0.08 24 0 0 0 0 7 266 0.00 0.00 0.52 99.32 0.74 98.45 373628274941881472.00 226159231228225312.00
0 2 19 0.61 3202 2400 0.07 35 0 0 0 0 8 280 0.00 0.00 0.54 99.24 0.74
1 1 17 0.53 3200 2400 0.07 19 0 0 0 0 7 267 0.00 0.00 0.48 99.37 0.97 98.23
1 3 20 0.62 3205 2400 0.12 40 0 0 0 0 12 282 0.00 0.00 0.87 98.90 0.97
```
________________________________________
From: Len Brown <lenb@...nel.org>
Sent: Sunday, November 30, 2025 6:27 AM
To: Ehlert, Emily; Zhang, Rui
Cc: linux-pm@...r.kernel.org; linux-kernel@...r.kernel.org; Emily Ehlert
Subject: RE: [EXTERNAL] [PATCH 2/2] tools/power/turbostat: Fix division by zero when TDP calculation fails
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Fri, Nov 28, 2025 at 9:00 AM Ehlert, Emily <ehemily@...zon.de> wrote:
>
> We are running turbostat inside a VM on the AWS Nitro Hypervisor.
> Guests are not provided with any power measurements. So reading the
> MSR_RAPL_POWER_UNIT will read 0. Since turbostat expects working
> RAPL for this CPU family, failing to read them leads to an exit (because
> setting the the timer fails). I agree that the patch should disable RAPL
> not after TPM but after the RAPL_POWER_UNIT MSR read.
>
> I am not experienced with the way turbostat uses the BIC counter macros.
> It seems like these are mostly used for enabling / disabling individual counters?
> How would I go about using them to disable RAPL in general without affecting
> other MSRs such as CPU%c1 which we can and want to read? I would appreciate
> some pointers or rough outline on how I can approach the issue.
Thanks, that is helpful.
I think we agree that this check should be earlier.
In this scenario, we want to override platform->rapl_msrs -- clearing it because
the MSRs are not actually available...
I cut a patch to replace all the tests of platform->rapl_msrs with a
global variable
that was initialized to (read-only) platform->rapl_msrs, but was
cleared before the test
in rapl_probe_intel(), and when I run with --no-perf (so that
turbostat must use MSRs)
it seems to disable all the RAPL stuff cleanly.
So the question becomes what test to use to determine that we should
not believe platform->,
and we should instead nuke RAPL support?
Once upon a time we used to actually probe the RAPL msrs by trying to read them.
If they failed to read or were zero, we would fail the probe and
disable the couter.
But that turned out to be problematic b/c some platforms had non-zero
unsupported MSRs etc.
so we moved to hard-coding the platform capabilities in a table.
In the VM, does the MSR read fail entirely, or does it just return 0
values for unsupported MSRs?
thanks,
Len Brown, Intel Open Source Technology Center
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Powered by blists - more mailing lists