linux-kernel - Re: [PATCH -printk] printk, tracing: fix console tracepoint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220711182918.338f000f@gandalf.local.home>
Date:   Mon, 11 Jul 2022 18:29:18 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Marco Elver <elver@...gle.com>,
        "Paul E. McKenney" <paulmck@...nel.org>
Cc:     John Ogness <john.ogness@...utronix.de>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        linux-kernel@...r.kernel.org, kasan-dev@...glegroups.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Johannes Berg <johannes.berg@...el.com>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Linux Kernel Functional Testing <lkft@...aro.org>
Subject: Re: [PATCH -printk] printk, tracing: fix console tracepoint


I know I acked this, but I finally got a tree where it is included in my
testing, and I hit this:

INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.860 msecs
------------[ cut here ]------------
WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
Modules linked in: ppdev parport_pc parport
CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: printk_sprint+0x81/0xda
Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
Call Trace:
 vprintk_store+0x24b/0x2ff
perf: interrupt took too long (7980 > 7977), lowering kernel.perf_event_max_sample_rate to 25000
 vprintk+0x37/0x4d
 _printk+0x14/0x16
 nmi_handle+0x1ef/0x24e
 ? find_next_bit.part.0+0x13/0x13
 ? find_next_bit.part.0+0x13/0x13
 ? function_trace_call+0xd8/0xd9
 default_do_nmi+0x57/0x1af
 ? trace_hardirqs_off_finish+0x2a/0xd9
 ? to_kthread+0xf/0xf
 exc_nmi+0x9b/0xf4
 asm_exc_nmi+0xae/0x29c


On Tue,  3 May 2022 09:38:44 +0200
Marco Elver <elver@...gle.com> wrote:

> Petr points out [1] that calling trace_console_rcuidle() in
> call_console_driver() had been the wrong thing for a while, because
> "printk() always used console_trylock() and the message was flushed to
> the console only when the trylock succeeded. And it was always deferred
> in NMI or when printed via printk_deferred()."

The issue is that we use "trace_console_rcuidle()" where the "_rcuidle()"
version uses srcu, which the last I knew is not safe in NMI context.

Paul, has that changed?

Thus, we need to make sure that printk() is always called when "rcu is
watching" and remove the _rcuidle() part, or we do not call it from nmi
context. Or make srcu nmi safe.

For now, I'm reverting this in my local tree.

-- Steve