linux-kernel - Re: [HELP] CPU Hard LOCKUP during boot up with HPET clock source

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Mon, 9 Apr 2018 12:26:42 +0530
From:   Pintu Kumar <pintu.ping@...il.com>
To:     open list <linux-kernel@...r.kernel.org>, linux-pm@...r.kernel.org
Subject: Re: [HELP] CPU Hard LOCKUP during boot up with HPET clock source

Hi,

As a simple query,
Is there a way to skip current available clock source (hpet) and allow
to pick the next one ?
I guess this will solve our purpose.


Thanks,
Pintu


On Fri, Apr 6, 2018 at 8:37 PM, Pintu Kumar <pintu.ping@...il.com> wrote:
> Hi,
>
> First the few details:
> Kernel: 4.9.20
> Machine: x86_64 (AMD)
> Model: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
> Cores: 8
> Available clock source:
> # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
> tsc hpet acpi_pm
>
> Problem:
> [   28.027409] NMI watchdog: Watchdog detected hard LOCKUP on cpu
> 1dModules linked in:c
> [   28.136317] RIP: 0010:[<ffffffff98058c43>] c [<ffffffff98058c43>]
> read_hpet+0xb3/0x120
> [...]
>
> ------------------
> This lockup happens during boot when the cpu is stuck for about ~28 seconds.
> This is because of our internal code changes.
> During our init function we are running some calibrate loops
> 10,000,000 (10MHz) times twice.
> The LOCKUP is coming because of this loop.
>
> But, we observed that the main issue is the clock source that is
> available at that time.
> At the time this loop is executed, the available clock source is HPET (not TSC).
> With HPET the loop runs slower. It takes almost 28 seconds to complete
> with HPET clock source. Hence the boot time also increase by 28
> seconds.
> Where as with TSC the loop completes in less than 4 seconds. So, with
> TSC we dont get the LOCKUP.
>
> Thus, the lockup is happening only because the loop executes with HPET
> clock source.
>
> To fix the problem, I tried the following approach:
> 1) Use late_initcall for our driver init to delay the call until TSC
> clock source is ready.
>     => With this there is no LOCKUP trace and no impact on boot time.
>     This is because the loop executes with TSC.
>
> 2) We have 2 loops. So I split the local_irq_save/restore part for
> each loops separately.
>      => With this also there is no backtrace seen.
>      => But boot time is increased.
>
> 3) I used delayed_workqueue to delay the execution of the loop by 5
> seconds, until TSC is ready.
>     => With this there is no back trace and also boot time is normal.
>     => But if we disable TSC then we still get the back trace.
>
> 4) Disabled HPET from kernel command line using : hpet=disable
>     => This also works as the loop executes with the next available
> clock source: acpi_pm
>     => But changing boot args is not recommended in our case.
>
> 5) Disable HPET related configs in kernel
>     => CONFIG_HPET=n
>     => CONFIG_HPET_TIMER=n
>     => This method does not work as we were not able to disable
> HPET_TIMER on x86_64.
>
> 6) Use hpet_disable() from our code.
>     => This method also does not work. It actually does not disable
> HPET clock source.
>
>
> -----------------------------
> Thus we wanted to know your opinion which is the right solution to fix
> this lockup during boot time.
>
> Is there a way to purposefully fallback to next available clock source
> (acpi_pm) instead of hpet, from the source code, before executing our
> loop ?
>
>
> Please let me know if there are alternate options.
>
>
>
> Thanks,
> Pintu