[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170621170734.GF23705@tassilo.jf.intel.com>
Date: Wed, 21 Jun 2017 10:07:34 -0700
From: Andi Kleen <ak@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Kan Liang <kan.liang@...el.com>, linux-kernel@...r.kernel.org,
dzickus@...hat.com, mingo@...nel.org, akpm@...ux-foundation.org,
babu.moger@...cle.com, atomlin@...hat.com, prarit@...hat.com,
torvalds@...ux-foundation.org, peterz@...radead.org,
eranian@...gle.com, acme@...hat.com, stable@...r.kernel.org
Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups
On Wed, Jun 21, 2017 at 05:12:06PM +0200, Thomas Gleixner wrote:
> On Wed, 21 Jun 2017, kan.liang@...el.com wrote:
> >
> > #ifdef CONFIG_HARDLOCKUP_DETECTOR
> > +/*
> > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which
> > + * can tick faster than the measured CPU Frequency due to Turbo mode.
> > + * That can lead to spurious timeouts.
> > + * To workaround the issue, extending the period by 3 times.
> > + */
> > u64 hw_nmi_get_sample_period(int watchdog_thresh)
> > {
> > - return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> > + return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3;
>
> The maximum turbo frequency of any given machine can be retrieved.
Not reliably, e.g. not in virtualization. Also it would require
model specific checks, so as soon as you have a new model and an
old kernel it could still randomly fail.
-Andi
Powered by blists - more mailing lists