linux-kernel - [PATCH] RFC: x86/smp: use printk_deferred in native_smp_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190507173329.17031-1-daniel.vetter@ffwll.ch>
Date:   Tue,  7 May 2019 19:33:29 +0200
From:   Daniel Vetter <daniel.vetter@...ll.ch>
To:     Intel Graphics Development <intel-gfx@...ts.freedesktop.org>
Cc:     Daniel Vetter <daniel.vetter@...ll.ch>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        linux-kernel@...r.kernel.org, Nicolai Stange <nstange@...e.de>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: [PATCH] RFC: x86/smp: use printk_deferred in native_smp_send_reschedule

console_trylock, called from within printk, can be called from pretty
much anywhere. Including try_to_wake_up. Note that this isn't common,
usually the box is in pretty bad shape at that point already. But it
really doesn't help when then lockdep jumps in and spams the logs,
potentially obscuring the real backtrace we're really interested in.
One case I've seen (slightly simplified backtrace):

 Call Trace:
  <IRQ>
  console_trylock+0xe/0x60
  vprintk_emit+0xf1/0x320
  printk+0x4d/0x69
  __warn_printk+0x46/0x90
  native_smp_send_reschedule+0x2f/0x40
  check_preempt_curr+0x81/0xa0
  ttwu_do_wakeup+0x14/0x220
  try_to_wake_up+0x218/0x5f0
  pollwake+0x6f/0x90
  credit_entropy_bits+0x204/0x310
  add_interrupt_randomness+0x18f/0x210
  handle_irq+0x67/0x160
  do_IRQ+0x5e/0x130
  common_interrupt+0xf/0xf
  </IRQ>

This alone isn't a problem, but the spinlock in the semaphore is also
still held while waking up waiters (up() -> __up() -> try_to_wake_up()
callchain), which then closes the runqueue vs. semaphore.lock loop,
and upsets lockdep, which issues a circular locking splat to dmesg.
Worse it upsets developers, since we don't want to spam dmesg with
clutter when the machine is dying already.

This is fix attempt number 3, we've already tried to:

- make the console_trylock trylock also the spinlock. This works in
  the limited case of the console_lock use-case, but doesn't fix the
  same semaphore.lock acquisition in the up() path in console_unlock,
  which we can't avoid with a trylock.

- move the wake_up_process in up() out from under the semaphore.lock
  spinlock critical section. Again this works for the limited case of
  the console_lock, and does fully break the cycle for this lock.
  Unfortunately there's still plenty of scheduler related locks that
  wake_up_process needs, so the loop is still there, just with a few
  less locks involved.

Hence now third attempt, trying to fix this by using printk_deferred()
instead of the normal printk that WARN() uses.
native_smp_send_reschedule is only called from scheduler related code,
which has to use printk_deferred due to this locking recursion, so
this seems consistent.

It has the unfortunate downside that we're losing the backtrace though
(I didn't find a printk_deferred version of WARN, and I'm not sure
it's a bright idea to dump that much using printk_deferred.)

Keeping all the people from the console_lock/printk related attempts
on cc as fyi.

Note: We can only hit this in our CI, with a very low reproduction
rate. And right now the lockdep splat and a few other things crowd out
what actually happens in the little bit of dmesg we recover, so no
idea yet why exactly we're hitting that WARN().

Signed-off-by: Daniel Vetter <daniel.vetter@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Will Deacon <will.deacon@....com>
Cc: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Daniel Vetter <daniel.vetter@...ll.ch>
Cc: John Ogness <john.ogness@...utronix.de>
Cc: linux-kernel@...r.kernel.org
Cc: Nicolai Stange <nstange@...e.de>
Cc: Thomas Gleixner <tglx@...utronix.de>
---
 arch/x86/kernel/smp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 04adc8d60aed..f19782786669 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -125,7 +125,8 @@ static bool smp_no_nmi_ipi = false;
 static void native_smp_send_reschedule(int cpu)
 {
 	if (unlikely(cpu_is_offline(cpu))) {
-		WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
+		printk_deferred(KERN_WARNING
+				"sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
 		return;
 	}
 	apic->send_IPI(cpu, RESCHEDULE_VECTOR);
-- 
2.20.1