[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=V1nqunM83LzSXnYiODC66tn5hjSWsUvxabf6vSO7reUQ@mail.gmail.com>
Date: Thu, 4 May 2023 15:16:23 -0700
From: Doug Anderson <dianders@...omium.org>
To: Petr Mladek <pmladek@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Mark Rutland <mark.rutland@....com>,
Randy Dunlap <rdunlap@...radead.org>,
Will Deacon <will@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Sumit Garg <sumit.garg@...aro.org>,
Daniel Thompson <daniel.thompson@...aro.org>,
Ian Rogers <irogers@...gle.com>, ravi.v.shankar@...el.com,
Marc Zyngier <maz@...nel.org>,
linux-perf-users@...r.kernel.org,
Stephane Eranian <eranian@...gle.com>,
kgdb-bugreport@...ts.sourceforge.net, ito-yuichi@...itsu.com,
linux-arm-kernel@...ts.infradead.org,
Stephen Boyd <swboyd@...omium.org>,
Masayoshi Mizuma <msys.mizuma@...il.com>,
ricardo.neri@...el.com, Lecopzer Chen <lecopzer.chen@...iatek.com>,
Chen-Yu Tsai <wens@...e.org>, Andi Kleen <ak@...ux.intel.com>,
Colin Cross <ccross@...roid.com>,
Matthias Kaehlcke <mka@...omium.org>,
Guenter Roeck <groeck@...omium.org>,
Tzung-Bi Shih <tzungbi@...omium.org>,
Alexander Potapenko <glider@...gle.com>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>,
Geert Uytterhoeven <geert+renesas@...der.be>,
Juergen Gross <jgross@...e.com>,
Kees Cook <keescook@...omium.org>,
Laurent Dufour <ldufour@...ux.ibm.com>,
Liam Howlett <liam.howlett@...cle.com>,
Masahiro Yamada <masahiroy@...nel.org>,
Matthias Brugger <matthias.bgg@...il.com>,
Michael Ellerman <mpe@...erman.id.au>,
Miguel Ojeda <ojeda@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Sami Tolvanen <samitolvanen@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Zhaoyang Huang <zhaoyang.huang@...soc.com>,
Zhen Lei <thunder.leizhen@...wei.com>,
linux-kernel@...r.kernel.org, linux-mediatek@...ts.infradead.org
Subject: Re: cpu hotplug : was: Re: [PATCH v3] hardlockup: detect hard lockups
using secondary (buddy) CPUs
Hi,
On Tue, May 2, 2023 at 8:23 AM Petr Mladek <pmladek@...e.com> wrote:
>
> On Mon 2023-05-01 08:24:46, Douglas Anderson wrote:
> > From: Colin Cross <ccross@...roid.com>
> >
> > Implement a hardlockup detector that doesn't doesn't need any extra
> > arch-specific support code to detect lockups. Instead of using
> > something arch-specific we will use the buddy system, where each CPU
> > watches out for another one. Specifically, each CPU will use its
> > softlockup hrtimer to check that the next CPU is processing hrtimer
> > interrupts by verifying that a counter is increasing.
> >
> > --- /dev/null
> > +++ b/kernel/watchdog_buddy_cpu.c
> > +int watchdog_nmi_enable(unsigned int cpu)
> > +{
> > + /*
> > + * The new CPU will be marked online before the first hrtimer interrupt
> > + * runs on it.
>
> It does not need to be the first hrtimer interrupt. The CPU might have
> been offlined/onlined repeatedly. The counter might have any value.
>
> > + * If another CPU tests for a hardlockup on the new CPU
> > + * before it has run its first hrtimer, it will get a false positive.
> > + * Touch the watchdog on the new CPU to delay the first check for at
> > + * least 3 sampling periods to guarantee one hrtimer has run on the new
> > + * CPU.
> > + */
OK, I've updated the above comment to:
/*
* The new CPU will be marked online before the hrtimer interrupt
* gets a chance to run on it. If another CPU tests for a
* hardlockup on the new CPU before it has run its the hrtimer
* interrupt, it will get a false positive. Touch the watchdog on
* the new CPU to delay the check for at least 3 sampling periods
* to guarantee one hrtimer has run on the new CPU.
*/
> > + per_cpu(watchdog_touch, cpu) = true;
>
> We should touch also the next_cpu:
>
> /*
> * We are going to check the next CPU. Our watchdog_hrtimer
> * need not be zero if the CPU has already been online earlier.
> * Touch the watchdog on the next CPU to avoid false positive
> * if we try to check it in less then 3 interrupts.
> */
> next_cpu = watchdog_next_cpu(cpu);
> if (next_cpu < nr_cpu_ids)
> per_cpu(watchdog_touch, next_cpu) = true;
>
> Alternative would be to clear watchdog_hrtimer. But it would kind-of
> affect also the softlockup detector.
Looks reasonable. I've incorporated it.
Powered by blists - more mailing lists