lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJvTdK=_v9q2eGMB6qG3iaDhXMzQHz-EJ4NeDEfBe2fbv+wKfQ@mail.gmail.com>
Date: Tue, 25 Nov 2025 13:13:54 -0500
From: Len Brown <lenb@...nel.org>
To: Emily Ehlert <ehemily@...zon.de>
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Emily Ehlert <ehemily@...zon.com>
Subject: Re: [PATCH 2/2] tools/power/turbostat: Fix division by zero when TDP
 calculation fails

Can you share what system configuration went down this error path?

I don't have a good feeling about this remedy.

The reason we get TDP is to estimate if RAPL will overflow its
undersized energy counters during the measurement interval.  We arm a
timer appropriately to accumulate RAPL if it looks like this will be
possible.

It is possible to run into that underlying issue, but it is not common usage.

So my question is if you have a system where RAPL works fine, but we
simply can't get TDP to build this safety net?

If so, it may make more sense to adjust the safety net, say, using
some conservative default timer interval, rather than not report RAPL
at all.

And if we do opt to disable RAPL, I'd rather uniformly use the
existing build-in-counter macros for testing if a counter is actually
present and enabled instead of a new global flag.

thanks,
-Len

On Thu, Nov 13, 2025 at 2:17 PM Emily Ehlert <ehemily@...zon.de> wrote:
>
> From: Emily Ehlert <ehemily@...zon.com>
>
> turbostat uses hard coded features for CPU families and expects access to
> RAPL (Running Average Power Limit) MSRs. When RAPL or power info is not
> available, turbostat reads PKG_POWER_INFO MSR to calculate TDP with
> RAPL_POWER_UNIT MSR. If TDP calculation results in 0, no zero check is
> performed and the 0 TDP is used in division, resulting in an invalid
> rapl_joule_counter_range. This variable is later used in msr_sum_record()
> as a timer parameter to timer_settime() syscall, causing issues.
>
> Fix the issue by:
>
> - Introduce zero check for tdp in rapl_probe_intel() and rapl_probe_amd()
> - Introduce global variable no_rapl which is set to true if zero check fails
> - Skip RAPL-dependent functions when no_rapl is true
> - Add assertions and guards to prevent RAPL operations when disabled
>
> Signed-off-by: Emily Ehlert <ehemily@...zon.com>
> ---
>  tools/power/x86/turbostat/turbostat.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> index 9a2be201a3a6..9c6ee0acbe12 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -531,6 +531,7 @@ static struct timeval procsysfs_tv_begin;
>  int ignore_stdin;
>  bool no_msr;
>  bool no_perf;
> +bool no_rapl;
>
>  enum gfx_sysfs_idx {
>         GFX_rc6,
> @@ -3119,6 +3120,10 @@ int dump_counters(PER_THREAD_PARAMS)
>
>  double rapl_counter_get_value(const struct rapl_counter *c, enum rapl_unit desired_unit, double interval)
>  {
> +       if (no_rapl) {
> +               return NAN;
> +       }
> +
>         assert(desired_unit != RAPL_UNIT_INVALID);
>
>         /*
> @@ -4696,6 +4701,8 @@ static size_t cstate_counter_info_count_perf(const struct cstate_counter_info_t
>
>  void write_rapl_counter(struct rapl_counter *rc, struct rapl_counter_info_t *rci, unsigned int idx)
>  {
> +       assert(!no_rapl);
> +
>         if (rci->source[idx] == COUNTER_SOURCE_NONE)
>                 return;
>
> @@ -4706,6 +4713,8 @@ void write_rapl_counter(struct rapl_counter *rc, struct rapl_counter_info_t *rci
>
>  int get_rapl_counters(int cpu, unsigned int domain, struct core_data *c, struct pkg_data *p)
>  {
> +       assert(!no_rapl);
> +
>         struct platform_counters *pplat_cnt = p == package_odd ? &platform_counters_odd : &platform_counters_even;
>         unsigned long long perf_data[NUM_RAPL_COUNTERS + 1];
>         struct rapl_counter_info_t *rci;
> @@ -5147,7 +5156,7 @@ int get_counters(PER_THREAD_PARAMS)
>         if (!is_cpu_first_thread_in_core(t, c, p))
>                 goto done;
>
> -       if (platform->has_per_core_rapl) {
> +       if (platform->has_per_core_rapl && !no_rapl) {
>                 status = get_rapl_counters(cpu, get_rapl_domain_id(cpu), c, p);
>                 if (status != 0)
>                         return status;
> @@ -5213,7 +5222,7 @@ int get_counters(PER_THREAD_PARAMS)
>         if (DO_BIC(BIC_SYS_LPI))
>                 p->sys_lpi = cpuidle_cur_sys_lpi_us;
>
> -       if (!platform->has_per_core_rapl) {
> +       if (!platform->has_per_core_rapl && !no_rapl) {
>                 status = get_rapl_counters(cpu, get_rapl_domain_id(cpu), c, p);
>                 if (status != 0)
>                         return status;
> @@ -7650,6 +7659,12 @@ void rapl_probe_intel(void)
>
>         tdp = get_tdp_intel();
>
> +       if (tdp == 0.0) {
> +               no_rapl = true;
> +               fprintf(outf, "RAPL: Could not calculate TDP (TDP: %.0f, MSR_RAPL_POWER_UNIT: %llx)\n", tdp, msr);
> +               return;
> +       }
> +
>         rapl_joule_counter_range = 0xFFFFFFFF * rapl_energy_units / tdp;
>         if (!quiet)
>                 fprintf(outf, "RAPL: %.0f sec. Joule Counter Range, at %.0f Watts\n", rapl_joule_counter_range, tdp);
> @@ -7680,6 +7695,12 @@ void rapl_probe_amd(void)
>
>         tdp = get_tdp_amd();
>
> +       if (tdp == 0.0) {
> +               no_rapl = true;
> +               fprintf(outf, "RAPL: Could not calculate TDP (TDP: %.0f, MSR_RAPL_POWER_UNIT: %llx)\n", tdp, msr);
> +               return;
> +       }
> +
>         rapl_joule_counter_range = 0xFFFFFFFF * rapl_energy_units / tdp;
>         if (!quiet)
>                 fprintf(outf, "RAPL: %.0f sec. Joule Counter Range, at %.0f Watts\n", rapl_joule_counter_range, tdp);
> @@ -11215,7 +11236,7 @@ int main(int argc, char **argv)
>
>         turbostat_init();
>
> -       if (!no_msr)
> +       if (!no_msr && !no_rapl)
>                 msr_sum_record();
>
>         /* dump counters and exit */
> --
> 2.47.3
>
>
>
>
> Amazon Web Services Development Center Germany GmbH
> Tamara-Danz-Str. 13
> 10243 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
> Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
> Sitz: Berlin
> Ust-ID: DE 365 538 597
>


-- 
Len Brown, Intel Open Source Technology Center

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ