lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <19e61b7e-022e-b384-1f37-7354b7ee889d@linux.intel.com>
Date:   Wed, 5 Jul 2023 10:23:44 -0700
From:   Keyon Jie <yang.jie@...ux.intel.com>
To:     Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Yair Podemsky <ypodemsk@...hat.com>
Subject: Re: [PATCH] x86/aperfmperf: Fix the fallback condition in
 arch_freq_get_on_cpu()



On 6/30/23 05:35, Thomas Gleixner wrote:
> On Mon, Jun 26 2023 at 12:36, Keyon Jie wrote:
>> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
>> index fdbb5f07448f..24e24e137226 100644
>> --- a/arch/x86/kernel/cpu/aperfmperf.c
>> +++ b/arch/x86/kernel/cpu/aperfmperf.c
>> @@ -432,7 +432,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>>   	 * Bail on invalid count and when the last update was too long ago,
>>   	 * which covers idle and NOHZ full CPUs.
>>   	 */
>> -	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE)
>> +	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE * cpu_khz)
> 
> What?
> 
> #define MAX_SAMPLE_AGE  ((unsigned long)HZ / 50)
> 
> HZ is the number of ticks (jiffies) per second. 20ms is 1/50 of a
> second.
> 
> As the sample age is measured in jiffies and the maximum is defined to
> be 20ms, the existing code is correct.
> 
> With your change the condition resolves to:
> 
>       delta > MAX_SAMPLE_AGE * cpu_khz
> 
> cpu_khz = Nominal CPU frequency / 1000
> 
> Ergo:
> 
>       delta > (HZ / 50) * (cpufreq / 1000)
> 
>                  HZ * cpufreq
> -->  delta > ------------------
>                    50000
> 
> Let's describe cpufreq in GHz:
> 
>                  HZ * G * 1e9
> -->  delta > ------------------
>                    50000
> 
> -->  delta > HZ * G * 20000
> 
> delta is calculated in jiffies, i.e. the number of ticks since the last
> invocation. Because HZ is ticks per second, the resulting timeout
> measured in seconds is:
> 
>           HZ * G * 20000 / HZ
> 
> -->      G * 20000 seconds
> 
> 20000 seconds for a 1GHz CPU, 40000 seconds for a 4GHz CPU independent
> of the actual HZ value.
> 
> jiffies are incremented once per tick, i.e. at tick frequency. The
> number of ticks required to reach 20ms depends obviously on the tick
> frequency, aka HZ.
> 
> HZ         ticks per second     tick period     Number of ticks which
>                                                  are equivalent to 20ms
>   100        100                 10ms             2
>   250        250                  4ms             5
> 1000       1000                  1ms            10
> 
> And that's what the code does correctly:
> 
> #define MAX_SAMPLE_AGE  ((unsigned long)HZ / 50)
> 
> No?
> 
>>  From the commit f3eca381bd49 on, the fallback condition about the 'the
>> last update was too long' have been comparing ticks and milliseconds by
>> mistake, which leads to that the condition is met and the fallback
>> method is used frequently.
> 
> The comparison is comparing a tick delta with a maximum number of ticks
> and that's not a mistake. It's simply correct.
> 
>> The change to compare ticks here corrects that and fixes related issues
>> have been seen on x86 platforms since 5.18 kernel.
> 
> I have no idea what you are trying to "fix" here, but that's moot as
> there is nothing to fix.

Hi Thomas and Dave,

Thank you for educating on this, I think you are totally right. So the 
original issue described in the bugzilla is not caused by what I 
mentioned here. Please ignore this patch, we need to figure out another 
fix for the issue, sorry for the confusion.

Thanks,
~Keyon

> 
> Thanks,
> 
>          tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ