lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fff246da-2a10-3463-614c-e54cd8cf78e7@gmail.com>
Date:   Mon, 18 Oct 2021 18:56:17 -0700
From:   Norbert <nbrtt01@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
        Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Yunfeng Ye <yeyunfeng@...wei.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>, frederic@...nel.org
Subject: Re: Performance regression: thread wakeup time (latency) increased up
 to 3x

On 10/18/21 04:25, Peter Zijlstra wrote:
> On Fri, Oct 15, 2021 at 09:08:58PM -0700, Norbert wrote:
> 
>>>>> On Fri, Oct 15, 2021 at 12:43:45AM -0700, Norbert wrote:
>>>>>> Performance regression: thread wakeup time (latency) increased up to 3x.
>>>>>>
>>>>>> Happened between 5.13.8 and 5.14.0. Still happening at least on 5.14.11.
> 
>> So git-bisect finally identified the following commit.
>> The performance difference came in a single step. Times were consistent with
>> my first post either the slow time or the fast time,
>> as far as I could tell during the bisection.
>>
>> It is a bit unfortunate that this comes from an attempt to reduce OS noise.
>>
>> -----------------------------------------------------
>> commit a5183862e76fdc25f36b39c2489b816a5c66e2e5
>> Author: Yunfeng Ye <yeyunfeng@...wei.com>
>> Date:   Thu May 13 01:29:16 2021 +0200
>>
>>      tick/nohz: Conditionally restart tick on idle exit
>>
>>      In nohz_full mode, switching from idle to a task will unconditionally
>>      issue a tick restart. If the task is alone in the runqueue or is the
>>      highest priority, the tick will fire once then eventually stop. But that
>>      alone is still undesired noise.
>>
>>      Therefore, only restart the tick on idle exit when it's strictly
>>      necessary.
>>
>>      Signed-off-by: Yunfeng Ye <yeyunfeng@...wei.com>
>>      Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
>>      Signed-off-by: Ingo Molnar <mingo@...nel.org>
>>      Acked-by: Peter Zijlstra <peterz@...radead.org>
>>      Link:
>> https://lore.kernel.org/r/20210512232924.150322-3-frederic@kernel.org
>> -----------------------------------------------------
>>
>> Is there anything else to do to complete this report?
> 
> So it _could_ be you're seeing increased use of deeper idle states due
> to less noise. I'm forever forgetting what the most friendly tool is for
> checking that (powertop can I think), Rafael?
> 
> One thing to try is boot with idle=halt and see if that makes a
> different.
> 
> Also, let me Cc all the people involved.. the thread starts:
> 
>    https://lkml.kernel.org/r/035c23b4-118e-6a35-36d9-1b11e3d679f8@gmail.com
> 


Booting with idle=halt results in a thread wakeup time of around 2000 
ns, so in the middle between the kernel 5.13 value of 1080 ns and the 
kernel 5.14/5.15 value of around 3550 ns. The wake call time remains at 
740 ns (meaning as bad as without this setting). I'm not sure how much 
that says or doesn't say. By the way, using cpufreq.off=1 seems to have 
no effect at all.

In the meantime I verified the finding from the git bisection by 
manually reverting the changes from this commit in the source code of 
the 5.15-rc5 code base. By doing so the timings for the 
isolated/nohz_full CPUs come back almost to the (good) 5.13 values (both 
wakeup and wake-call).

However the timings for the non-isolated CPUs are unaffected and remain
with the worse performance of 1.3x for the wakeup and 1.4x for the wake 
call. So this apparently requires a separate independent git-bisect and 
is probably a second separate issue (if it is also due to a single change).

I've tried a bit to narrow down the cause of the 3.3x slowdown but am 
still trying to find my way through the maze of little functions... :-).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ