[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1406192349400.15014@pobox.suse.cz>
Date: Thu, 19 Jun 2014 23:56:36 +0200 (CEST)
From: Jiri Kosina <jkosina@...e.cz>
To: Steven Rostedt <rostedt@...dmis.org>
cc: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>, Jan Kara <jack@...e.cz>,
Frederic Weisbecker <fweisbec@...il.com>,
Dave Anderson <anderson@...hat.com>,
Petr Mladek <pmladek@...e.cz>
Subject: Re: [RFC][PATCH 0/3] x86/nmi: Print all cpu stacks from NMI safely
On Thu, 19 Jun 2014, Steven Rostedt wrote:
> This is my proposal to print the NMI stack traces from an RCU stall safely.
> Here's the gist of it.
>
> Patch 1: move the trace_seq out of the tracing code. It's useful for other
> purposes too. Like writing from an NMI context.
>
> Patch 2: Add a per_cpu "printk_func" that printk calls. By default it calls
> vprintk_def() which does what it has always done. This allows us to
> override what printk() calls normally on a per cpu basis.
>
> Patch 3: Have the NMI handler that dumps the stack trace just change the
> printk_func to call a NMI safe printk function that writes to a per cpu
> trace_seq. When all NMI handlers chimed in, the original caller prints
> out the trace_seqs for each CPU from a printk safe context.
>
> This is much less intrusive than the other versions out there.
I agree this is less intrusive than having printk() use two versions of
the buffers and perform merging, OTOH, it doesn't really seem to be
fully clean and systematic solution either.
I had a different idea earlier today, and Petr seems to have implemented
it already; I guess he'll be sending it out as RFC tomorrow for
comparision.
The idea basically is to *switch* what arch_trigger_all_cpu_backtrace()
and arch_trigger_all_cpu_backtrace_handler() are doing; i.e. use the NMI
as a way to stop all the CPUs (one by one), and let the CPU that is
sending the NMIs around to actually walk and dump the stacks of the CPUs
receiving the NMI IPI.
It's the most trivial aproach I've been able to come up with, and should
be usable for everybody (RCU stall detector and sysrq). The only tricky
part is: if we want pt_regs to be part of the dump as well, how to pass
those cleanly between the 'stopped' CPU and the CPU that is doing the
printing. Other than that, it's just moving a few lines of code around, I
believe.
What do you think?
--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists