lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F07753713A22@SHSMSX103.ccr.corp.intel.com>
Date:   Wed, 28 Jun 2017 13:24:08 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dzickus@...hat.com" <dzickus@...hat.com>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "babu.moger@...cle.com" <babu.moger@...cle.com>,
        "atomlin@...hat.com" <atomlin@...hat.com>,
        "prarit@...hat.com" <prarit@...hat.com>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "acme@...hat.com" <acme@...hat.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups


> > From: Kan Liang <Kan.liang@...el.com>
> >
> > Some users reported spurious NMI watchdog timeouts.
> >
> > We now have more and more systems where the Turbo range is wide
> enough
> > that the NMI watchdog expires faster than the soft watchdog timer that
> > updates the interrupt tick the NMI watchdog relies on.
> 
> AFAIR the watchdog doesn't rely on deferred timers so this would suggest
> that a standard hrtimer can expire much later than programmed, right?

The softlockup watchdog relies on hrtimers.
The hardlockup watchdog (NMI watchdog) relies on perf subsystem and
using unhalted CPU cycles.
When the softlockup watchdog expires, it updates the hrtimer_interrupts.
When the NMI watchdog expires, it will check the hrtimer_interrupts, and
determine if it's a hardlockup.
The design was to make the softlockup watchdog runs with 2.5 times the
rate of NMI watchdog. So it guarantees that the hrtimer_interrupts is
updated before the NMI watchdog expires.
That works well if Turbo-Mode is disabled.
However, when Turbo-Mode is enabled, unhalted CPU cycles might run
much faster than expected, even faster than softlockup watchdog.
So the softlockup watchdog will not get a chance to update the
hrtimer_interrupts, which will trigger false positives.


Thanks,
Kan

> If that is the case how come other parts of the system do not break. We do
> rely on hrtimers on many other places?
> --
> Michal Hocko
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ