lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Jan 2020 20:10:16 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Waiman Long <longman@...hat.com>,
        Robert Richter <rrichter@...vell.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Kees Cook <keescook@...omium.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] watchdog: Fix possible soft lockup warning at bootup

Waiman Long <longman@...hat.com> writes:

> On 1/16/20 11:57 AM, Thomas Gleixner wrote:
>>> So your theory the MONOTONIC clock runs differently/wrongly could
>>> explain that (assuming this drives the sched clock). Though, I am
>> No. sched_clock() is separate. It uses a raw timestamp (in your case
>> from the ARM arch timer) and converts it to something which is close to
>> proper time. So my assumption was based on the printout Waiman had:
>>
>>  [ 1... ] CPU.... watchdog_fn now  170000000
>>  [ 25.. ] CPU.... watchdog_fn now 4170000000
>>
>> I assumed that now comes from ktime_get() or something like
>> that. Waiman?
>
> I printed out the now parameter of theĀ  __hrtimer_run_queues() call.

Yes. That's clock MONOTONIC.

> So from the timer perspective, it is losing time. For watchdog, the soft
> expiry time is 4s. The watchdog function won't be called until the
> timer's time advances 4s or more. That corresponds to about 24s in
> timestamp time for that particular class of systems.

Right. And assumed that the firmware call is the culprit this has an
explanation.

Could you please take sched_clock() timestamps before and after the
firmware call which kicks the secondary CPUs into life to verify that?

They should sum up to the amount of time which gets lost accross
smp_init().

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ