lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 Jun 2012 22:55:04 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Fengguang Wu <fengguang.wu@...el.com>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Dave Jones <davej@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Lai Jiangshan <laijs@...fujitsu.com>
Subject: Re: rcu-torture boot hang

Just a note. Please use my goodmis email and not my Red Hat email. My
Red Hat email is not checked as often, and I don't usually tag emails
there as "reply to". I only author with that email to give credit to the
one that pays me to do the work (and I don't mean just for the LWN
stats).

On Wed, Jun 13, 2012 at 08:49:00PM +0800, Fengguang Wu wrote:
> Hi Steven,
> 
> On Tue, Jun 12, 2012 at 08:11:22PM +0800, Fengguang Wu wrote:
> > Hi Paul,
> > 
> > In kernel boot tests with the attached config, I find the 3.5-rc2+
> > kernels all hang here:
> > 
> > [    6.522546] rcu-torture:--- Start of test: nreaders=2
> > nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0
> > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
> 
> It turns out that commit 5963e317b1e9d2a4511503916d8fd664bb8fa8fb
> ("ftrace/x86: Do not change stacks in DEBUG when calling lockdep") is
> the root cause of this boot hang. The commit is reported by git bisect
> and confirmed to fix the boot hang by reverting in on 3.5-rc2.

I get the hang with your config too. If I disable both LOCKDEP and
IRQSOFF_TRACER which turns off TRACE_IRQFLAGS then the lockup goes away.
This is consistent with your findings, as the patch you found is only
enabled when TRACE_IRQFLAGS is on.

> 
>         [ rcu torture tests ]
> 
> ...It would hang here before reverting the commit...
> 
>         [    1.611901] Testing tracer function: PASSED

I did a bit more investigation and found that the problem comes from the
load_idt() command that is used to avoid reseting the debug stack. When
the function tracer is enabled it adds breakpoints to all locations that
it is about to trace in order to convert the nops in the functions into
calls to the tracer. But if the breakpoint handler is traced, the
breakpoint in the handler will reset the stack and cause a crash.

Your above observation is correct. Actually I added a printk in my
testing and found that the 'Testing trace function:' does output before
the lockup. I think printk changed recently that prevents it to print
out right away if a '\n' is not supplied. The function tracer self test
prints:

  'Testing tracer function: '

Runs the test, and when it succeeds it prints 'PASSED\n'. Because the
first part wasn't printed, we never saw that the function tracer was
being tested when the lockup occurred.

This is a major flaw with the new printk(). It hides what is being
tested. I need to write a patch to create a fflush() for printk.

Anyway, load_idt() shouldn't be traced, but with PARAVIRT_GUEST it makes
functions that are even always_inlined not inlined, and allows things
like native_load_idt() to be traced, and that causes bad things as we
want to call load_idt to prevent recursion of the int3 handler.


If it works, can you give me your 'Tested-by'.

Thanks!

-- Steve

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index e5834aa..6a6d7ae 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -47,9 +47,9 @@
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
     !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-# define inline		inline		__attribute__((always_inline))
-# define __inline__	__inline__	__attribute__((always_inline))
-# define __inline	__inline	__attribute__((always_inline))
+# define inline		inline		__attribute__((always_inline)) notrace
+# define __inline__	__inline__	__attribute__((always_inline)) notrace
+# define __inline	__inline	__attribute__((always_inline)) notrace
 #else
 /* A lot of inline functions can cause havoc with function tracing */
 # define inline		inline		notrace
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ