lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 4 Sep 2020 14:20:50 -0700
From:   Chao Zhou <chao@...o.com>
To:     paulmck@...nel.org
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu: allow multiple stalls before panic

Thanks Paul, sounds good!

Much appreciated your guidance.

Chao

eero inc.

660 3rd St, 4th Floor

San Francisco, CA 94107



On Fri, Sep 4, 2020 at 1:37 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Fri, Sep 04, 2020 at 12:40:29PM -0700, Chao Zhou wrote:
> > Thanks Paul. Appreciated it.
> >
> > Initial intent was to give users a way to make their system more
> > tolerable, but prudent enough to recover if suspicious behavior
> > reaches a watermark. If a system experiences multiple stalls in one
> > lifetime, no matter how healthy it looks or whether the stalls are
> > from different sources, we still want it to dramatically recover.
> > Please share your guidance?
>
> I have no guidance in this case.  I was just wanting to verify that the
> patch was in fact doing what you want it to.  And it sounds like it does,
> so good!
>
> I have queued this for testing and further review.  If all goes well,
> I would submit it upstream for the v5.11 merge window, that is, not the
> upcoming merge window, but the one after that.
>
>                                                         Thanx, Paul
>
> > eero inc.
> >
> > 660 3rd St, 4th Floor
> >
> > San Francisco, CA 94107
> >
> >
> >
> > On Fri, Sep 4, 2020 at 11:05 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> > >
> > > On Sun, Aug 30, 2020 at 11:41:17PM -0700, chao wrote:
> > > > Some stalls are transient and system can fully recover.
> > > > Allow users to configure the number of stalls experienced
> > > > to trigger kernel Panic.
> > > >
> > > > Signed-off-by: chao <chao@...o.com>
> > >
> > > Hearing no objections, I have queued this with wordsmithing as shown
> > > below.  Please let me know if I messed something up.
> > >
> > > One question, though.  It looks like setting this to (say) 5 would panic
> > > after the fifth RCU CPU stall warning message, regardless whether all
> > > five were reporting the same RCU CPU stall event or whether they instead
> > > were five widely separated transient RCU CPU stall events, where the
> > > system fully recovered from each event.  Is this the intent?
> > >
> > >                                                         Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > commit e710c928fb52d8e56bc6173515805301da6aa22b
> > > Author: chao <chao@...o.com>
> > > Date:   Sun Aug 30 23:41:17 2020 -0700
> > >
> > >     rcu: Panic after fixed number of stalls
> > >
> > >     Some stalls are transient, so that system fully recovers.  This commit
> > >     therefore allows users to configure the number of stalls that must happen
> > >     in order to trigger kernel panic.
> > >
> > >     Signed-off-by: chao <chao@...o.com>
> > >     Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > >
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > index 500def6..fc2dd3f 100644
> > > --- a/include/linux/kernel.h
> > > +++ b/include/linux/kernel.h
> > > @@ -536,6 +536,7 @@ extern int panic_on_warn;
> > >  extern unsigned long panic_on_taint;
> > >  extern bool panic_on_taint_nousertaint;
> > >  extern int sysctl_panic_on_rcu_stall;
> > > +extern int sysctl_max_rcu_stall_to_panic;
> > >  extern int sysctl_panic_on_stackoverflow;
> > >
> > >  extern bool crash_kexec_post_notifiers;
> > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > > index 0fde39b..228c55f 100644
> > > --- a/kernel/rcu/tree_stall.h
> > > +++ b/kernel/rcu/tree_stall.h
> > > @@ -13,6 +13,7 @@
> > >
> > >  /* panic() on RCU Stall sysctl. */
> > >  int sysctl_panic_on_rcu_stall __read_mostly;
> > > +int sysctl_max_rcu_stall_to_panic __read_mostly;
> > >
> > >  #ifdef CONFIG_PROVE_RCU
> > >  #define RCU_STALL_DELAY_DELTA          (5 * HZ)
> > > @@ -106,6 +107,11 @@ early_initcall(check_cpu_stall_init);
> > >  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
> > >  static void panic_on_rcu_stall(void)
> > >  {
> > > +       static int cpu_stall;
> > > +
> > > +       if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
> > > +               return;
> > > +
> > >         if (sysctl_panic_on_rcu_stall)
> > >                 panic("RCU Stall\n");
> > >  }
> > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > > index 287862f..1bca490 100644
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -2651,6 +2651,17 @@ static struct ctl_table kern_table[] = {
> > >                 .extra2         = SYSCTL_ONE,
> > >         },
> > >  #endif
> > > +#if defined(CONFIG_TREE_RCU)
> > > +       {
> > > +               .procname       = "max_rcu_stall_to_panic",
> > > +               .data           = &sysctl_max_rcu_stall_to_panic,
> > > +               .maxlen         = sizeof(sysctl_max_rcu_stall_to_panic),
> > > +               .mode           = 0644,
> > > +               .proc_handler   = proc_dointvec_minmax,
> > > +               .extra1         = SYSCTL_ONE,
> > > +               .extra2         = SYSCTL_INT_MAX,
> > > +       },
> > > +#endif
> > >  #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
> > >         {
> > >                 .procname       = "stack_erasing",

Powered by blists - more mailing lists