linux-kernel - Re: [PATCH -printk] printk, tracing: fix console tracepoint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220714145324.GA24338@pathway.suse.cz>
Date:   Thu, 14 Jul 2022 16:53:24 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Marco Elver <elver@...gle.com>,
        John Ogness <john.ogness@...utronix.de>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        linux-kernel@...r.kernel.org, kasan-dev@...glegroups.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Johannes Berg <johannes.berg@...el.com>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Linux Kernel Functional Testing <lkft@...aro.org>
Subject: Re: [PATCH -printk] printk, tracing: fix console tracepoint

On Wed 2022-07-13 07:05:50, Paul E. McKenney wrote:
> On Wed, Jul 13, 2022 at 01:25:41PM +0200, Petr Mladek wrote:
> > On Tue 2022-07-12 08:16:55, Paul E. McKenney wrote:
> > > Maybe printk() is supposed to be invoked from noinstr.  It might be a
> > > special case in the tooling.  I have no idea.  ;-)
> > 
> > I think that it is ok to do _not_ support printk() in noinstr parts.
> > 
> > > However, the current SRCU read-side algorithm will tolerate being invoked
> > > from noinstr as long as it is not also an NMI handler.  Much though
> > > debugging tools might (or might not) complain.
> > > 
> > > Don't get me wrong, I can make SRCU tolerate being called while RCU is
> > > not watching.  It is not even all that complicated.  The cost is that
> > > architectures that have NMIs but do not have NMI-safe this_cpu*()
> > > operations have an SRCU reader switch from explicit smp_mb() and
> > > interrupt disabling to a cmpxchg() loop relying on the implicit barriers
> > > in cmpxchg().
> > > 
> > > For arm64, this was reportedly a win.
> > 
> > IMHO, the tracepoint in printk() is not worth slowing down other
> > important fast paths.
> > 
> > The tracepoint was moved into vprintk_store() in 5.19-rc1. It used
> > to be in console_unlock() before. The previous location was not
> > reliable by definition. Old messages might be overridden by new
> > ones before they reach console. Also messages in NMI context
> > used to be stored in per-CPU buffers. There was even bigger
> > risk that they would not reach the console.
> 
> Fair enough, works for me!

The remaining question is how to make the code safe and calm
down the warning.

My understanding is that Peter Zijlstra wants to reduce the scope
of the rcuidle code even more in the future. So, we could
do something like:

>From 24c3517dedf2a30efabe72871c188fbfffffd397 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@...e.com>
Date: Thu, 14 Jul 2022 14:54:12 +0200
Subject: [PATCH] printk: Make console tracepoint safe in NMI() context

The commit 701850dc0c31bfadf75a0 ("printk, tracing: fix console
tracepoint") moved the tracepoint from console_unlock() to
vprintk_store(). As a result, it might be called in any
context and triggered the following warning:

  WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
  Modules linked in: ppdev parport_pc parport
  CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
  Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  EIP: printk_sprint+0x81/0xda
  Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
  EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
  ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
  CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
  Call Trace:
   vprintk_store+0x24b/0x2ff
   vprintk+0x37/0x4d
   _printk+0x14/0x16
   nmi_handle+0x1ef/0x24e
   ? find_next_bit.part.0+0x13/0x13
   ? find_next_bit.part.0+0x13/0x13
   ? function_trace_call+0xd8/0xd9
   default_do_nmi+0x57/0x1af
   ? trace_hardirqs_off_finish+0x2a/0xd9
   ? to_kthread+0xf/0xf
   exc_nmi+0x9b/0xf4
   asm_exc_nmi+0xae/0x29c

It comes from:

  #define __DO_TRACE(name, args, cond, rcuidle) \
  [...]
		/* srcu can't be used from NMI */	\
		WARN_ON_ONCE(rcuidle && in_nmi());	\

It might be possible to make srcu working in NMI. But it
would be slower on some architectures. It is not worth
doing it just because of this tracepoint.

It would be possible to disable this tracepoint in NMI
or in rcuidle context. Where the rcuidle context looks
more rare and thus more acceptable to be ignored.

Alternative solution would be to move the tracepoint
back to console code. But the location is less reliable
by definition. Also the synchronization against other
tracing messages is much worse.

Let's ignore the tracepoint in rcuidle context as the least
evil solution.

There seems to be three possibilities.
Link: https://lore.kernel.org/r/20220712151655.GU1790663@paulmck-ThinkPad-P17-Gen-1

Signed-off-by: Petr Mladek <pmladek@...e.com>
---
 kernel/printk/printk.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b49c6ff6dca0..a13cf3310204 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2108,7 +2108,15 @@ static u16 printk_sprint(char *text, u16 size, int facility,
 		}
 	}
 
-	trace_console_rcuidle(text, text_len);
+	/*
+	 * trace_console_idle() is not working in NMI. printk()
+	 * is used more often in NMI than in rcuidle context.
+	 * Choose the less evil solution here.
+	 *
+	 * smp_processor_id() is reliable in rcuidle context.
+	 */
+	if (!rcu_is_idle_cpu(smp_processor_id()))
+		trace_console(text, text_len);
 
 	return text_len;
 }
-- 
2.35.3