linux-kernel - [RFC][PATCH] tracing: Have stack tracer force RCU to be watching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151020121031.2f9d72c1@gandalf.local.home>
Date:	Tue, 20 Oct 2015 12:10:31 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching

Paul,

I've spent a couple of days debugging this, and finally found that my
stack tracer was calling the stack trace code, which calls
__module_address() which asserts the below.

Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
everywhere (with interrupts always disabled)? This patch appears to fix
the bug.

Peter,

I'm going to be sending a second patch that converts that from a
WARN_ON() to an open coded WARN_ON_ONCE(), because WARN_ON() also calls
the stack trace code which calls __module_address() and we end up with
an infinite warning about it. This prevented me from seeing where the
bug actually was, and crashed the box.

-- Steve

>From a2d7629048322ae62bff57f34f5f995e25ed234c Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@...dmis.org>
Date: Tue, 20 Oct 2015 11:38:08 -0400
Subject: [PATCH] tracing: Have stack tracer force RCU to be watching

The stack tracer was triggering the WARN_ON() in module.c:

 static void module_assert_mutex_or_preempt(void)
 {
 #ifdef CONFIG_LOCKDEP
	if (unlikely(!debug_locks))
		return;

	WARN_ON(!rcu_read_lock_sched_held() &&
		!lockdep_is_held(&module_mutex));
 #endif
 }

The reason is that the stack tracer traces all function calls, and some of
those calls happen while exiting or entering user space and idle. Some of
these functions are called after RCU had already stopped watching, as RCU
does not watch userspace or idle CPUs.

If a max stack is hit, then the save_stack_trace() is called, which will
check module addresses and call module_assert_mutex_or_preempt(), and then
trigger the warning. Sad part is, the warning itself will also do a stack
trace and tigger the same warning. That probably should be fixed.

The warning was added by 0be964be0d45 "module: Sanitize RCU usage and
locking" but this bug has probably been around longer. But it's unlikely to
cause much harm, but the new warning causes the system to lock up.

Cc: stable@...r.kernel.org # 4.2+
Cc: Peter Zijlstra <peterz@...radead.org>
Cc:"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
---
 kernel/trace/trace_stack.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index b746399ab59c..5f29402bff0f 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -88,6 +88,12 @@ check_stack(unsigned long ip, unsigned long *stack)
 	local_irq_save(flags);
 	arch_spin_lock(&max_stack_lock);

+	/*
+	 * RCU may not be watching, make it see us.
+	 * The stack trace code uses rcu_sched.
+	 */
+	rcu_irq_enter();
+
 	/* In case another CPU set the tracer_frame on us */
 	if (unlikely(!frame_size))
 		this_size -= tracer_frame;
@@ -169,6 +175,7 @@ check_stack(unsigned long ip, unsigned long *stack)
 	}

  out:
+	rcu_irq_exit();
 	arch_spin_unlock(&max_stack_lock);
 	local_irq_restore(flags);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/