lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 14 Mar 2010 11:27:53 +0100 From: Frederic Weisbecker <fweisbec@...il.com> To: Steven Rostedt <rostedt@...dmis.org> Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>, Andrew Morton <akpm@...ux-foundation.org>, Li Zefan <lizf@...fujitsu.com>, Lai Jiangshan <laijs@...fujitsu.com>, stable@...nel.org Subject: Re: [PATCH 5/5] tracing: Do not record user stack trace from NMI context On Fri, Mar 12, 2010 at 09:57:00PM -0500, Steven Rostedt wrote: > From: Steven Rostedt <srostedt@...hat.com> > > A bug was found with Li Zefan's ftrace_stress_test that caused applications > to segfault during the test. > > Placing a tracing_off() in the segfault code, and examining several > traces, I found that the following was always the case. The lock tracer > was enabled (lockdep being required) and userstack was enabled. Testing > this out, I just enabled the two, but that was not good enough. I needed > to run something else that could trigger it. Running a load like hackbench > did not work, but executing a new program would. The following would > trigger the segfault within seconds: > > # echo 1 > /debug/tracing/options/userstacktrace > # echo 1 > /debug/tracing/events/lock/enable > # while :; do ls > /dev/null ; done > > Enabling the function graph tracer and looking at what was happening > I finally noticed that all cashes happened just after an NMI. > > 1) | copy_user_handle_tail() { > 1) | bad_area_nosemaphore() { > 1) | __bad_area_nosemaphore() { > 1) | no_context() { > 1) | fixup_exception() { > 1) 0.319 us | search_exception_tables(); > 1) 0.873 us | } > [...] > 1) 0.314 us | __rcu_read_unlock(); > 1) 0.325 us | native_apic_mem_write(); > 1) 0.943 us | } > 1) 0.304 us | rcu_nmi_exit(); > [...] > 1) 0.479 us | find_vma(); > 1) | bad_area() { > 1) | __bad_area() { > > After capturing several traces of failures, all of them happened > after an NMI. Curious about this, I added a trace_printk() to the NMI > handler to read the regs->ip to see where the NMI happened. In which I > found out it was here: > > ffffffff8135b660 <page_fault>: > ffffffff8135b660: 48 83 ec 78 sub $0x78,%rsp > ffffffff8135b664: e8 97 01 00 00 callq ffffffff8135b800 <error_entry> > > What was happening is that the NMI would happen at the place that a page > fault occurred. It would call rcu_read_lock() which was traced by > the lock events, and the user_stack_trace would run. This would trigger > a page fault inside the NMI. I do not see where the CR2 register is > saved or restored in NMI handling. This means that it would corrupt > the page fault handling that the NMI interrupted. > > The reason the while loop of ls helped trigger the bug, was that > each execution of ls would cause lots of pages to be faulted in, and > increase the chances of the race happening. > > The simple solution is to not allow user stack traces in NMI context. > After this patch, I ran the above "ls" test for a couple of hours > without any issues. Without this patch, the bug would trigger in less > than a minute. > > Cc: stable@...nel.org > Reported-by: Li Zefan <lizf@...fujitsu.com> > Signed-off-by: Steven Rostedt <rostedt@...dmis.org> Wow, that's a race :) In perf this is dealt with a special copy_from_user_nmi() (see in arch/x86/kernel/cpu/perf_event.c) May be save_stack_trace_user() should use that instead of a __copy_from_user_inatomic() based thing, just to cover such NMI corner race case. > --- > kernel/trace/trace.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 484337d..e52683f 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -1284,6 +1284,13 @@ ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc) > if (!(trace_flags & TRACE_ITER_USERSTACKTRACE)) > return; > > + /* > + * NMIs can not handle page faults, even with fix ups. > + * The save user stack can (and often does) fault. > + */ > + if (unlikely(in_nmi())) > + return; > + > event = trace_buffer_lock_reserve(buffer, TRACE_USER_STACK, > sizeof(*entry), flags, pc); > if (!event) > -- > 1.7.0 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists