linux-kernel - Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161129150917.tk5xkl7teveybaxa@treble>
Date:   Tue, 29 Nov 2016 09:09:17 -0600
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Vince Weaver <vincent.weaver@...ne.edu>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        "dvyukov@...gle.com" <dvyukov@...gle.com>, pmladek@...e.com
Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start

On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote:
> > > > We used to do that, but the resulting NMIs were problematic on some
> > > > platforms.  Perhaps things have gotten better?
> > > 
> > > Did a little digging on git blame and found the following commit (which
> > > seems to be the cause of the KASAN warning and missing stack dump):
> > > 
> > >   bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks")
> > > 
> > > I presume this commit is still needed because of the NMI printk deadlock
> > > issues which were discussed at Kernel Summit.  I guess those issues need
> > > to be sorted out before the above commit can be reverted.
> > 
> > so printk should more or less work from NMI, esp. after:
> > 
> >   42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
> 
> And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion
> below.  Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as
> needing more work.  Has that happened?

Petr M, any idea?

> But I really like the fact that RCU CPU stall warnings dump only those
> stacks that are likely to be involved, and the patch below goes back
> to dumping everyone.  Shouldn't be that hard to fix, though...

There's a new trigger_single_cpu_backtrace() function which can be used
for that.

> ------------------------------------------------------------------------
> 
> commit e7c9d76ed508fe978c6657e33f4de1b160ee4efe
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date:   Tue Nov 29 05:49:06 2016 -0800
> 
>     rcu: Once again use NMI-based stack traces in stall warnings
>     
>     This commit is for all intents and purposes a revert of bc1dce514e9b
>     ("rcu: Don't use NMIs to dump other CPUs' stacks").  The reason to
>     suppose that this can now safely be reverted is the presence of
>     42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"),
>     which is said to have made NMI-based stack dumps safe.
>     
>     Not-yet-signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>     Cc: Petr Mladek <pmladek@...e.com>
>     Cc: Josh Poimboeuf <jpoimboe@...hat.com>
>     Cc: Peter Zijlstra <peterz@...radead.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 91a68e4e6671..d73ccd4bed86 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1396,7 +1396,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
>  }
>  
>  /*
> - * Dump stacks of all tasks running on stalled CPUs.
> + * Dump stacks of all tasks running on stalled CPUs.  First try using
> + * NMIs, but fall back to manual remote stack tracing on architectures
> + * that don't support NMI-based stack dumps.  The NMI-triggered stack
> + * traces are more accurate because they are printed by the target CPU.
>   */
>  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
>  {
> @@ -1404,6 +1407,8 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
>  	unsigned long flags;
>  	struct rcu_node *rnp;
>  
> +	if (trigger_all_cpu_backtrace())
> +		return;
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		if (rnp->qsmask != 0) {
> 

-- 
Josh