lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Jun 2017 15:03:59 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     kan.liang@...el.com
Cc:     linux-kernel@...r.kernel.org, dzickus@...hat.com, mingo@...nel.org,
        babu.moger@...cle.com, atomlin@...hat.com, prarit@...hat.com,
        torvalds@...ux-foundation.org, peterz@...radead.org,
        tglx@...utronix.de, eranian@...gle.com, acme@...hat.com,
        ak@...ux.intel.com, Kan Liang <Kan.liang@...el.com>,
        stable@...r.kernel.org
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Tue, 20 Jun 2017 14:33:09 -0700 kan.liang@...el.com wrote:

> From: Kan Liang <Kan.liang@...el.com>
> 
> Some users reported spurious NMI watchdog timeouts.
> 
> We now have more and more systems where the Turbo range is wide enough
> that the NMI watchdog expires faster than the soft watchdog timer that
> updates the interrupt tick the NMI watchdog relies on.
> 
> This problem was originally added by commit 58687acba592
> ("lockup_detector: Combine nmi_watchdog and softlockup detector").
> Previously the NMI watchdog would always check jiffies, which were
> ticking fast enough. But now the backing is quite slow so the expire
> time becomes more sensitive.
> 
> For mainline the right fix is to switch the NMI watchdog to reference
> cycles, which tick always at the same rate independent of turbo mode.
> But this is requires some complicated changes in perf, which are too
> difficult to backport. Since we need a stable fix too just increase the
> NMI watchdog rate here to avoid the spurious timeouts. This is not an
> ideal fix because a 3x as large Turbo range could still fail, but for
> now that's not likely.
> 
> ...
>
> The right fix for mainline can be found here.
> perf/x86/intel: enable CPU ref_cycles for GP counter
> perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
> https://patchwork.kernel.org/patch/9779087/
> https://patchwork.kernel.org/patch/9779089/

Presumably the "right fix" will later be altered to revert this
one-line workaround?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ