linux-kernel - Re: [RFC PATCH 00/11] printk: safe printing in NMI context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140618212022.GV4669@linux.vnet.ibm.com>
Date:	Wed, 18 Jun 2014 14:20:22 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Jiri Kosina <jkosina@...e.cz>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Michal Hocko <mhocko@...e.cz>, Jan Kara <jack@...e.cz>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Dave Anderson <anderson@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Petr Mladek <pmladek@...e.cz>, Kay Sievers <kay@...y.org>
Subject: Re: [RFC PATCH 00/11] printk: safe printing in NMI context

On Wed, Jun 18, 2014 at 11:12:48PM +0200, Jiri Kosina wrote:
> On Wed, 18 Jun 2014, Paul E. McKenney wrote:
> 
> > > >  	/* Complain about tasks blocking the grace period. */
> > > > @@ -1044,8 +1041,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> > > >  	pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
> > > >  		jiffies - rsp->gp_start,
> > > >  		(long)rsp->gpnum, (long)rsp->completed, totqlen);
> > > > -	if (!trigger_all_cpu_backtrace())
> > > > -		dump_stack();
> > > > +	rcu_dump_cpu_stacks(rsp);
> > > 
> > > This is prone to producing not really consistent stacktraces though, 
> > > right? As the target task is still running at the time the stack is being 
> > > walked, it might produce stacktraces that are potentially nonsensial.
> > 
> > If a CPU is stuck, the stack trace down to where it is stuck is
> > likely to be static.  But yes, there is some potential for confusion.
> > My (admittedly limited) rcutorture testing produced sensible stack traces,
> > but things might be a bit uglier in other situations.
> 
> I agree that it might work nicely for RCU stall detector indeed. I was 
> looking for solution that'd work nicely both for RCU and for sysrq-l 
> (where we can't rely on processess being stuck in any way).

Agreed.  And if some more generally useful approach appears, I will be
quite happy to adjust RCU to use it.  In the meantime, I expect that
my patch will be helpful.

							Thanx, Paul

> > > How about sending NMI to the target CPU, so that the task is actually 
> > > stopped, but printing its stacktrace from the CPU that detected the stall 
> > > while it's stopped?
> > > 
> > > That way, there is no printk()-from-NMI, but also the stacktrace is 
> > > guaranteed to be self-consistent.
> > 
> > I believe that this was what Steven was suggesting, though by using
> > tracing.  
> 
> My understanding was that Steven is suggesting using trace_printk() from 
> NMI.
> 
> > Of course, if my current approach isn't up to the job, then something 
> > like this general approach would look quite good.
> 
> Thanks,
> 
> -- 
> Jiri Kosina
> SUSE Labs
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/