linux-kernel - Re: [PATCH -printk] printk, tracing: fix console tracepoint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220712002128.GQ1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Mon, 11 Jul 2022 17:21:28 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Marco Elver <elver@...gle.com>,
        John Ogness <john.ogness@...utronix.de>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        linux-kernel@...r.kernel.org, kasan-dev@...glegroups.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Johannes Berg <johannes.berg@...el.com>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Linux Kernel Functional Testing <lkft@...aro.org>
Subject: Re: [PATCH -printk] printk, tracing: fix console tracepoint

On Mon, Jul 11, 2022 at 06:29:18PM -0400, Steven Rostedt wrote:
> 
> I know I acked this, but I finally got a tree where it is included in my
> testing, and I hit this:
> 
> INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.860 msecs
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
> Modules linked in: ppdev parport_pc parport
> CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> EIP: printk_sprint+0x81/0xda
> Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
> EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
> ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
> CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
> Call Trace:
>  vprintk_store+0x24b/0x2ff
> perf: interrupt took too long (7980 > 7977), lowering kernel.perf_event_max_sample_rate to 25000
>  vprintk+0x37/0x4d
>  _printk+0x14/0x16
>  nmi_handle+0x1ef/0x24e
>  ? find_next_bit.part.0+0x13/0x13
>  ? find_next_bit.part.0+0x13/0x13
>  ? function_trace_call+0xd8/0xd9
>  default_do_nmi+0x57/0x1af
>  ? trace_hardirqs_off_finish+0x2a/0xd9
>  ? to_kthread+0xf/0xf
>  exc_nmi+0x9b/0xf4
>  asm_exc_nmi+0xae/0x29c
> 
> 
> On Tue,  3 May 2022 09:38:44 +0200
> Marco Elver <elver@...gle.com> wrote:
> 
> > Petr points out [1] that calling trace_console_rcuidle() in
> > call_console_driver() had been the wrong thing for a while, because
> > "printk() always used console_trylock() and the message was flushed to
> > the console only when the trylock succeeded. And it was always deferred
> > in NMI or when printed via printk_deferred()."
> 
> The issue is that we use "trace_console_rcuidle()" where the "_rcuidle()"
> version uses srcu, which the last I knew is not safe in NMI context.
> 
> Paul, has that changed?

On x86, both srcu_read_lock() and srcu_read_unlock() should be OK from
NMI context, give or take their use of lockdep.  Which is why we have
srcu_read_lock_notrace() and srcu_read_unlock_notrace(), which do not
use lockdep.  Which __DO_TRACE() does in fact invoke.  Ah, but you have
this: "WARN_ON_ONCE(rcuidle && in_nmi())".

Because all the world is not an x86.

For srcu_read_lock() and srcu_read_unlock() to be NMI-safe more generally,
this_cpu_inc() would need to be NMI-safe on all platforms.  This requires
it be implemented with read-modify-write atomic instructions, which
appears to be the case on arm64, s390 (for at least some configs,
maybe all depending on what the "laa" instruction does), and again x86.
The generic code relies on disabling interrrupts, which does nothing
against NMIs.

So srcu_read_lock_notrace() and srcu_read_unlock_notrace() are NMI-safe
only on arm64, s390, and x86.  So that WARN_ON_ONCE() is necessary.

> Thus, we need to make sure that printk() is always called when "rcu is
> watching" and remove the _rcuidle() part, or we do not call it from nmi
> context. Or make srcu nmi safe.

Which means making this_cpu_inc() NMI-safe.  Which could be done
by making the generic this_cpu_inc() implementation use something
like arch_atomic_fetch_add().  Or a cmpxchg() loop.  Or making
__srcu_read_lock() and srcu_read_unlock() check the architecture and Do
The Right Thing.  Maybe a CONFIG_THIS_CPU_NMI_SAFE that is set by
x86, s390, and arm64?

Is either approach something that those other architectures would be
willing to countenance?  For me, it might be worth it just to not have
to dig through many layers of this_cpu macro definitions from time
to time...  :-/

							Thanx, Paul

> For now, I'm reverting this in my local tree.
> 
> -- Steve
>