lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 6 Aug 2018 12:44:48 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc:     linux-kernel@...r.kernel.org, kernel-team@...roid.com,
        Ingo Molnar <mingo@...hat.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        paulmck@...ux.vnet.ibm.com, mathieu.desnoyers@...icios.com,
        namhyung@...nel.org, peterz@...radead.org
Subject: Re: [RFC] tracepoint: Run tracepoints even after CPU is offline

On Sun,  5 Aug 2018 20:44:37 -0700
"Joel Fernandes (Google)" <joel@...lfernandes.org> wrote:

> Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
> causes a problem for lockdep using tracepoint code. Once a CPU is
> offline, tracepoints donot get called, however this causes a big problem
> for lockdep probes that need to run so that IRQ annotations are marked
> correctly.
> 
> An issue is possible where while the CPU is going offline, an interrupt
> can come in and then a lockdep assert causes an annotation warning:
> 
> [  106.551354] IRQs not enabled as expected
> [  106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
>                                          tick_nohz_idle_enter+0x99/0xb0
> [  106.552964] Modules linked in:
> [  106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W
> 
> We need tracepoints to run as late as possible. This commit tries to fix
> the issue by removing the cpu_online check in tracepoint code that was
> introduced in the mentioned commit, however now we run the risk of
> running dereferencing probes that aren't RCU protected, which gives an
> RCU warning like so on boot up:
> [    0.030159] x86: Booting SMP configuration:
> [    0.030169] .... node  #0, CPUs:      #1
> [    0.001000]
> [    0.001000] =============================
> [    0.001000] WARNING: suspicious RCU usage
> [    0.001000] 4.18.0-rc6+ #42 Not tainted
> [    0.001000] -----------------------------
> [    0.001000] ./include/trace/events/timer.h:38 suspicious
> 				rcu_dereference_check() usage!
> [    0.001000]
> [    0.001000] other info that might help us debug this:
> [    0.001000]
> [    0.001000]
> [    0.001000] RCU used illegally from offline CPU!
> [    0.001000] rcu_scheduler_active = 1, debug_locks = 1
> [    0.001000] no locks held by swapper/1/0.
> [    0.001000]
> 
> Any ideas on how we can fix this? Basically we need RCU to work here
> even after !cpu_online. I thought of just using SRCU for all tracepoints
> however that may mean we can't use tracepoints from NMI..
> 
> Tries-to-Fix: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
>                              unify their usage")
> Reported-by: Masami Hiramatsu <mhiramat@...nel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
>  include/linux/tracepoint.h | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index d9a084c72541..020885714a0f 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -365,19 +365,17 @@ extern void syscall_unregfunc(void);
>   * "void *__data, proto" as the callback prototype.
>   */
>  #define DECLARE_TRACE_NOARGS(name)					\
> -	__DECLARE_TRACE(name, void, ,					\
> -			cpu_online(raw_smp_processor_id()),		\
> +	__DECLARE_TRACE(name, void, , 1,				\
>  			void *__data, __data)
>  
>  #define DECLARE_TRACE(name, proto, args)				\
> -	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
> -			cpu_online(raw_smp_processor_id()),		\
> +	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,		\
>  			PARAMS(void *__data, proto),			\
>  			PARAMS(__data, args))
>  
>  #define DECLARE_TRACE_CONDITION(name, proto, args, cond)		\
>  	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
> -			cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
> +			PARAMS(cond),					\
>  			PARAMS(void *__data, proto),			\
>  			PARAMS(__data, args))
>  

Actually, I took a look at it too, and came up with this patch. Can you
test it out? The difference is that it doesn't stop the _rcuidle
version of the trace events to not be called by "cpu_online".

Which brings up another question. Can srcu work on cpu offlined CPUs?

-- Steve

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d9a084c72541..4aba6c807d28 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -211,8 +211,11 @@ extern void syscall_unregfunc(void);
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
+				cpu_online(raw_smp_processor_id()) &&	\
 				TP_CONDITION(cond), 0);			\
-		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
+		if (IS_ENABLED(CONFIG_LOCKDEP) &&			\
+		    cpu_online(raw_smp_processor_id()) &&		\
+		    (cond)) {						\
 			rcu_read_lock_sched_notrace();			\
 			rcu_dereference_sched(__tracepoint_##name.funcs);\
 			rcu_read_unlock_sched_notrace();		\
@@ -365,19 +368,17 @@ extern void syscall_unregfunc(void);
  * "void *__data, proto" as the callback prototype.
  */
 #define DECLARE_TRACE_NOARGS(name)					\
-	__DECLARE_TRACE(name, void, ,					\
-			cpu_online(raw_smp_processor_id()),		\
+	__DECLARE_TRACE(name, void, , 1,				\
 			void *__data, __data)
 
 #define DECLARE_TRACE(name, proto, args)				\
-	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
-			cpu_online(raw_smp_processor_id()),		\
+	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,		\
 			PARAMS(void *__data, proto),			\
 			PARAMS(__data, args))
 
 #define DECLARE_TRACE_CONDITION(name, proto, args, cond)		\
 	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
-			cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
+			PARAMS(cond),					\
 			PARAMS(void *__data, proto),			\
 			PARAMS(__data, args))
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ