linux-kernel - Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20161129180141.GY3924@linux.vnet.ibm.com>
Date:   Tue, 29 Nov 2016 10:01:41 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Josh Poimboeuf <jpoimboe@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vince Weaver <vincent.weaver@...ne.edu>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        "dvyukov@...gle.com" <dvyukov@...gle.com>
Subject: Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start

On Tue, Nov 29, 2016 at 05:12:46PM +0100, Petr Mladek wrote:
> On Tue 2016-11-29 09:09:17, Josh Poimboeuf wrote:
> > On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote:
> > > > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote:
> > > > > > We used to do that, but the resulting NMIs were problematic on some
> > > > > > platforms.  Perhaps things have gotten better?
> > > > > 
> > > > > Did a little digging on git blame and found the following commit (which
> > > > > seems to be the cause of the KASAN warning and missing stack dump):
> > > > > 
> > > > >   bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks")
> > > > > 
> > > > > I presume this commit is still needed because of the NMI printk deadlock
> > > > > issues which were discussed at Kernel Summit.  I guess those issues need
> > > > > to be sorted out before the above commit can be reverted.
> > > > 
> > > > so printk should more or less work from NMI, esp. after:
> > > > 
> > > >   42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
> > > 
> > > And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion
> > > below.  Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as
> > > needing more work.  Has that happened?
> > 
> > Petr M, any idea?
> 
> These two architectures do not support the safe printk in NMI. But
> these architectures also do not implement trigger_all_cpu_backtrace()
> and other trigger_*_backtrace() functions. Therefore these functions
> return false there.
> 
> In fact, only very few architectures implement trigger_*_backtrace().
> And only few of them use NMI (x86, arm, tile). I have just double
> checked that these all use the safe printk in NMI.
> 
> By other words, if trigger_all_cpu_backtrace() or
> trigger_single_cpu_backtrace() returns true, it should be NMI safe
> and you could use it here.

Good, I will upgrade my commit to Signed-off-by, then.

> > > But I really like the fact that RCU CPU stall warnings dump only those
> > > stacks that are likely to be involved, and the patch below goes back
> > > to dumping everyone.  Shouldn't be that hard to fix, though...
> > 
> > There's a new trigger_single_cpu_backtrace() function which can be used
> > for that.
> 
> There is newly also trigger_cpumask_backtrace(struct cpumask *mask)
> where you could select more CPUs using the mask. If this is of any help.

In my experience, there is almost never a large number of CPUs stalling
a given RCU grace period.  But thank you for letting me know about
trigger_cpumask_backtrace(), as it might be useful in the future.

							Thanx, Paul