lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZHCmQXECccH7ctq2@alley>
Date:   Fri, 26 May 2023 14:29:53 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Doug Anderson <dianders@...omium.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthias Kaehlcke <mka@...omium.org>,
        kgdb-bugreport@...ts.sourceforge.net,
        Stephane Eranian <eranian@...gle.com>, mpe@...erman.id.au,
        Tzung-Bi Shih <tzungbi@...omium.org>,
        Daniel Thompson <daniel.thompson@...aro.org>,
        Mark Rutland <mark.rutland@....com>,
        linuxppc-dev@...ts.ozlabs.org, Sumit Garg <sumit.garg@...aro.org>,
        npiggin@...il.com, davem@...emloft.net,
        Marc Zyngier <maz@...nel.org>,
        Stephen Boyd <swboyd@...omium.org>, sparclinux@...r.kernel.org,
        christophe.leroy@...roup.eu,
        Catalin Marinas <catalin.marinas@....com>,
        ravi.v.shankar@...el.com, Randy Dunlap <rdunlap@...radead.org>,
        Pingfan Liu <kernelfans@...il.com>,
        Guenter Roeck <groeck@...omium.org>,
        Lecopzer Chen <lecopzer.chen@...iatek.com>,
        Ian Rogers <irogers@...gle.com>, ito-yuichi@...itsu.com,
        ricardo.neri@...el.com, linux-arm-kernel@...ts.infradead.org,
        linux-perf-users@...r.kernel.org, Will Deacon <will@...nel.org>,
        Chen-Yu Tsai <wens@...e.org>, linux-kernel@...r.kernel.org,
        Masayoshi Mizuma <msys.mizuma@...il.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Colin Cross <ccross@...roid.com>
Subject: Re: [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using
 secondary (buddy) CPUs

On Thu 2023-05-25 13:08:04, Doug Anderson wrote:
> Hi,
> 
> On Thu, May 25, 2023 at 9:27 AM Petr Mladek <pmladek@...e.com> wrote:
> >
> > On Fri 2023-05-19 10:18:38, Douglas Anderson wrote:
> > > Implement a hardlockup detector that doesn't doesn't need any extra
> > > arch-specific support code to detect lockups. Instead of using
> > > something arch-specific we will use the buddy system, where each CPU
> > > watches out for another one. Specifically, each CPU will use its
> > > softlockup hrtimer to check that the next CPU is processing hrtimer
> > > interrupts by verifying that a counter is increasing.
> > >
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> > >
> > >  #endif /* CONFIG_HARDLOCKUP_DETECTOR */
> > >
> > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
> > >
> > >  static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> > >  static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> > > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void)
> > >  }
> > >  EXPORT_SYMBOL(arch_touch_nmi_watchdog);
> > >
> > > +void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > > +{
> > > +     per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > > +
> > > +     /* Match with smp_rmb() in watchdog_hardlockup_check() */
> > > +     smp_wmb();
> >
> > It is great that you described where the related barrier is.
> >
> > Another important information is what exactly is synchronized.
> > And I am actually not sure what we are synchronizing here.
> >
> > My understanding is that a write barrier should synchronize
> > related writes, for example:
> >
> >         X = ...;
> >         /* Make sure that X is modified before Y */
> >         smp_wmb();
> >         Y = ...;
> >
> > And the related read barrier should synchronize the related reads,
> > for example:
> >
> >         if (test(Y)) {
> >                 /*
> >                  * Make sure that we use the updated X when
> >                  * we saw the updated Y.
> >                  */
> >                  smp_rmb();
> >                  do_something(X);
> >          }
> >
> > IMHO, we do not need any barrier here because we have only
> > one variable "watchdog_hardlockup_touched" here.
> > watchdog_hardlockup_check() will either see the updated value
> > or not. But it does not synchronize it against any other
> > variables or values.
> 
> Fair. These barriers were present in the original buddy lockup
> detector that we've been carrying in ChromeOS but that doesn't
> necessarily mean that they were there for a good reason.
> 
> Reasoning about weakly ordered memory always makes my brain hurt and I
> never feel confident at the end that I got the right answer and, of
> course, this is coupled by the fact that if I have a logic error in my
> reasoning that it might cause a rare / subtle bug. :(

Sure. Lockless code is complicated.

> When possible I
> try to write code that uses full blown locks to make it easier to
> reason about (even if less efficient),

Makes sense. There should be a good reason to use lockless code
because it is complicated to do it right and maintain.

> but that's not necessarily
> possible here. While we obviously don't just want to sprinkle barriers
> all over the code, IMO it's not a terrible sin to put a barrier in a
> case where it makes it easier to reason about the order of things.

I understand this. Well, it is always important to describe the
the reason why the barrier was added there. Even when it is wrong,
it gives a clue what was the motivation. Otherwise, it is hard
to do any changes on the code later.

I guess that it might be more problematic for you because
you probably are not the original author.

> In any case, I guess in this case I would worry about some sort of
> ordering race when enabling / disabling the buddy lockup detector. At
> the end of the buddy's watchdog_hardlockup_enable() /
> watchdog_hardlockup_disable() we adjust the "watchdog_cpus" which
> changes buddy assignments. Without a barrier, I _think_ it would be
> possible for a new CPU to notice the change in buddies without
> noticing the touch. Does that match your understanding? Now when
> reasoning about CPUs going online/offline we need to consider this
> extra case and we have to decide if there's any chance it could lead
> to a false lockup detection. With the memory barriers here, it's a
> little easier to think about.

This makes sense. I did not think about the hotplug scenario.

Well, I suggest to move the barriers into the buddy code and describe
it there. It does not make sense to use the barriers for the perf
hardlockup.

I mean something like:

diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index fee45af2e5bd..ebe71dcb55e6 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -52,6 +52,13 @@ void watchdog_hardlockup_enable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on this CPU before
+	 * other CPUs could see it in watchdog_cpus. The counter
+	 * part is in watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_set_cpu(cpu, &watchdog_cpus);
 }
 
@@ -69,6 +76,13 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on the next CPU before
+	 * this CPU disappear in watchdog_cpus. The counter part is in
+	 * watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_clear_cpu(cpu, &watchdog_cpus);
 }
 
@@ -89,5 +103,12 @@ void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts)
 	if (next_cpu >= nr_cpu_ids)
 		return;
 
+	/*
+	 * Make sure that the watchdog was touched on next CPU when
+	 * watchdog_next_cpu() returned another one because of
+	 * a change in watchdog_hardlockup_enable()/disable().
+	 */
+	smp_rmb();
+
 	watchdog_hardlockup_check(next_cpu, NULL);
 }

> Did the above convince you about keeping the barriers? If so, do you
> have any suggested comment that would make it clearer?
> 
> 
> > > +}
> > > +
> > >  static bool is_hardlockup(unsigned int cpu)
> > >  {
> > >       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > >       struct pt_regs *regs = get_irq_regs();
> > >       int duration;
> > >       int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> > > +     unsigned long hrtimer_interrupts;
> > >
> > >       if (!watchdog_enabled)
> > >               return HRTIMER_NORESTART;
> > >
> > > -     watchdog_hardlockup_kick();
> > > +     hrtimer_interrupts = watchdog_hardlockup_kick();
> > > +
> > > +     /* test for hardlockups */
> >
> > I would omit the comment. It is not valid when perf detector is used.
> > And checking the buddy is clear from the function name.
> >
> > > +     watchdog_buddy_check_hardlockup(hrtimer_interrupts);
> >
> > I would personally move this into watchdog_hardlockup_kick().
> > watchdog_timer_fn() is already complex enough. And checking
> > the buddy when kicking a CPU makes sense.
> 
> Sure, I'll add that to my list of things to follow-up with.
> 
> 
> > Also I would not pass "hrtimer_interrupts". I guess that it is
> > just an optimization. It is an extra churn in the code. IMHO,
> > is is not wort it. This code does not need to be super optimized.
> 
> The main reason I did it is that "hrtimer_interrupts" is static to
> watchdog.c now. If I don't pass it in then I have to make it
> non-static and add it to the header. That also means anyone looking at
> the variable and figuring out how it is read/written needs to go
> search for other people that reference it. I feel like it's cleaner to
> just pass it in. If you feel strongly that I should change this then
> let me know, but otherwise I'll plan to leave this how I have it.

I do not have strong opinion. For me, the more important change is
to move watchdog_buddy_check_hardlockup() into
watchdog_hardlockup_kick(). watchdog_timer_fn() is already too complex.

> 
> > >       /* kick the softlockup detector */
> > >       if (completion_done(this_cpu_ptr(&softlockup_completion))) {
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
> > >
> > >         Say N if unsure.
> > >
> > > -config HARDLOCKUP_DETECTOR_PERF
> > > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer
> > > +# interrupts. This config enables functions managing this common code.
> > > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > >       bool
> > >       select SOFTLOCKUP_DETECTOR
> > >
> > > +config HARDLOCKUP_DETECTOR_PERF
> > > +     bool
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> > > +     select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > > +
> > > +config HARDLOCKUP_DETECTOR_BUDDY
> > > +     bool
> > > +     depends on SMP
> > > +     select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > > +
> > > +# For hardlockup detectors you can have one directly provided by the arch
> > > +# or use a "non-arch" one. If you're using a "non-arch" one that is
> > > +# further divided the perf hardlockup detector (which, confusingly, needs
> > > +# arch-provided perf support) and the buddy hardlockup detector (which just
> > > +# needs SMP). In either case, using the "non-arch" code conflicts with
> > > +# the NMI watchdog code (which is sometimes used directly and sometimes used
> > > +# by the arch-provided hardlockup detector).
> > > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> > > +     bool
> > > +     depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
> > > +     default y
> > > +
> > > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +     bool "Prefer the buddy CPU hardlockup detector"
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
> >
> > Huh, I have big troubles to scratch my head around this check:
> >
> >        HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP
> >
> >        and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again
> >                on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP.
> 
> The goal is to have "HARDLOCKUP_DETECTOR_PREFER_BUDDY" to show up as
> an option if there is an option _other_ than the buddy. If there's not
> more than one hardlockup detector to pick from then there's no reason
> to ask the person configuring the kernel which one they'd prefer. At
> the moment, if you have an "arch" lockup detector then you're stuck
> with it, so you only get a choice if a "perf" detector is available
> and you've got SMP.
> 
> Ah, so I guess this could be simplified to:
> 
> depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP

Yes, this is much better.

> OK, I'll add that to the list.
> 
> 
> > > +     help
> > > +       Say Y here to prefer the buddy hardlockup detector over the perf one.
> > > +
> > > +       With the buddy detector, each CPU uses its softlockup hrtimer
> > > +       to check that the next CPU is processing hrtimer interrupts by
> > > +       verifying that a counter is increasing.
> > > +
> > > +       This hardlockup detector is useful on systems that don't have
> > > +       an arch-specific hardlockup detector or if resources needed
> > > +       for the hardlockup detector are better used for other things.
> > > +
> > > +# This will select the appropriate non-arch hardlockdup detector
> > > +config HARDLOCKUP_DETECTOR_NON_ARCH
> > > +     bool
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> > > +     select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +     select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +
> > >  #
> > >  # Enables a timestamp based low pass filter to compensate for perf based
> > >  # hard lockup detection which runs too fast due to turbo modes.
> > > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP
> > >  config HARDLOCKUP_DETECTOR
> > >       bool "Detect Hard Lockups"
> > >       depends on DEBUG_KERNEL && !S390
> >
> > Is there any reason why S390 could not or do not want to use the buddy
> > hardlockup detector.
> 
> This isn't a new dependency, but it's a good question. Looking at the
> git history, I see commit dea20a3fbdd0 ("[PATCH] Disable
> DETECT_SOFTLOCKUP for s390"). ...and it looks like the softlockup
> detector still says it's broken on s390. That would mean that the
> buddy detector is broken too.

It seems that s390 wanted to disable the watchdog completely, see
the commit  dea20a3fbdd08e5 ("[PATCH] Disable DETECT_SOFTLOCKUP for s390")
because they got too many false positives.

> 
> > > -     depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > >       select LOCKUP_DETECTOR
> > > -     select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
> > > +     select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> >
> > Anyway, the configuration of the hard lockup detectors is insane and
> > this patchset makes it even worse, especially the new
> > HARDLOCKUP_DETECTOR_NON_ARCH stuff.
> >
> > It seems that sparc, powerpc and s390 are somehow special. Do you
> > still have in mind how they are distinguished using the Kconfig
> > variables?
> >
> > For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG.
> >
> > And sparc has its own variant of
> > watchdog_hardlockup_enable()/disable(). It means that it is
> > arch-specific. Does it work with the 13th patch which made
> > watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type
> > specific? Is is somehow related to HAVE_NMI_WATCHDOG?
> > Does this replace the entire watchdog only only the enable part?
> >
> > I think that we need to make this more straightforward. But I first
> > need to understand the existing maze of config variables.
> 
> I agree that it's confusing. I'm obviously biased, but IMO it's less
> confusing after my patchset than before. ;-) The state of the world
> before my patchset set a pretty low bar.
> 
> As far as I understand it, at an architecture-level you can choose any
> _ONE_ of the following:
> 
> a) Implement bits needed for the the "perf" hardlockup detector. x86
> has done this, some configs of powerpc do this, and arm64 now after my
> patch series. This is HAVE_HARDLOCKUP_DETECTOR_PERF.
> 
> b) Implement your own totally separate hardlockup detector that
> doesn't use any of the common "perf" code but still looks the same to
> userspace (same sysctls, etc). Only powerpc does this (in some
> configs). As per conversations in previous versions of my patch
> series, apparently powerpc's version is quite fancy and maybe someday
> people can move some of these features to the common code. This is
> HAVE_HARDLOCKUP_DETECTOR_ARCH.
> 
> c) Don't implement the full features of a hardlockup detector but
> still have the basics. In the very least, I think it doesn't support
> the sysctls "hardlockup_panic" and "hardlockup_all_cpu_backtrace". It
> doesn't support the kernel command line parameter "nmi_watchdog=". I
> don't know for sure if there are any other differences. Only sparc64
> does this. This is HAVE_NMI_WATCHDOG. Confusingly,
> HAVE_HARDLOCKUP_DETECTOR_ARCH selects HAVE_NMI_WATCHDOG.
> 
> d) Don't implement _any_ hardlockup detector of any sort. After my
> patchset you can still end up with "buddy" if you have SMP.
>
> One thing that would probably help would be to bring sparc64 to a full
> "arch" hardlockup implementation and then get rid of the special case.
> That seems a bit outside my scope, though if someone wanted to post
> patches for that I'd be willing to give them a review.

It would be nice but it might be problematic if we do not have
access to the hardware.

> I guess other than that, the best we could try to do is to rename some
> configs and/or add some subconfigs to describe certain features? Maybe
> HAVE_NMI_WATCHDOG => HAVE_HARDLOCKUP_DETECTOR_ARCH_BASIC_FEATURES
> would help? I'd love to come up with a better name for
> HAVE_HARDLOCKUP_DETECTOR_NON_ARCH but I couldn't come up with one.
> Maybe the unwieldy  "HAVE_HARDLOCKUP_DETECTOR_THAT_COUNTS_HRTIMER"?

Renaming the config variables seems to be the best solution at the moment.
IMHO, it would be nice to have something like:

  + CONFIG_HARDLOCKUP_DETECTOR for code shared by all hardlockup
    detectors

  + CONFIG_HARDLOCKUP_DETECTOR_PERF for code using kernel/watchdog_perf.c

  + CONFIG_HARDLOCKUP_DETECTOR_BUDDY for code using
	  kernel/watchdog_buddy.c

  + HAVE_HARDLOCKUP_DETECTOR_PERF set by architectures that support
	  using kernel/watchdog_perf.c

  + HAVE_HARDLOCKUP_DETECTOR_ARCH set by architectures that have
	alternative implementation of the hardlockup detector

  + CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY
	Allow to prefer the buddy detector when _PERF or _ARCH
	is available as well.

  + HAVE_HARDLOCKUP_PANIC
  + HAVE_HARDLOCKUP_ALL_CPU_BACKTRACE
	set when the earchitecture support these features and
	used for the sysfs interface

> If you have concrete suggestions for what would be cleaner, let me
> know and I can queue up a patch. ...or I'm happy to review a patch.

I am not sure how complicated it would be to rename the config
variables to somehing sane. I am sorry I do not have time to
prepare the patches at the moment.

I'll let Andrew to decide if he would require this cleanup to accept
the patchset.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ